LLM Watch: Llama 3

The Llama 3 large language model (LLM) is Meta’s latest and most advanced LLM to date. Succeeding Llama 1 and 2, this most recent model excels at understanding and generating human-like text across a wide range of tasks such as question answering, summarization, translation, and computing programming. Meta AI – the AI assistant built into Facebook, Messenger, Instagram, and WhatsApp – currently relies on Llama 3.

Meta has released four versions of Llama 3 so far:

  • Llama 3 8B
  • Llama 3 8B-Instruct
  • Llama 3 70B
  • Llama 3 70B-Instruct

The two 8B models have 8 billion parameters, while the 70B models work with 70 billion parameters. The Instruct models were fine-tuned to better follow human directions, and are therefore more suited to be used as a chatbot compared to the raw Llama model. Meta is also currently working on a 400 billion-parameter version of Llama 3 that the company hopes to make available later in 2024.

Llama 3 is trained using over 15 trillion tokens containing content from publicly available sources. This is seven times the number of tokens used to train Llama 2. Llama 3 is also using a new tokenizer with a 128,256 token vocabulary, which is an improvement over the previous 32,000 token vocabulary used to train previous models. This improvement allows Llama 3 to better handle long contexts up to 8,192 tokens.

Llama 3 also has a high level of language understanding, especially considering its parameter size. The Measure of Language Understanding (MMLU) metric is a benchmark to evaluate an LLM’s ability to understand language.

Llama 3 8B received a score of 66.6 MMLU, while Llama 3 70B received 79.5 MMLU. These numbers pale in comparison to GPT-4 Turbo’s score of 88.4. However, GPT-4 Turbo reportedly works with 1 trillion parameters. The upcoming Llama 3 400B achieved a score of 86.1, making it competitive with an LLM that has more than double the parameter size.

While Llama 3 is clearly a top contender in the world of LLMs, it does fall short in certain areas. Llama 3 only works with text and is currently unable to understand images, video, and audio. Additionally, Llama 3 is primarily focused on English, and Meta is still developing multilingual capabilities.

There is also some controversy concerning the open-source nature of Llama 3. On one hand, Meta has made the model weights, code, and some training data for Llama 3 publicly available. On the other hand, Llama 3’s licensing terms require companies with over 700 million monthly active users to obtain a separate commercial license from Meta to use Llama 3, and Meta can choose to grant or deny this license at its discretion. Many have argued that this restriction violates the open source definition set by the Open Source Initiative.

Despite certain drawbacks, Llama 3 represents a significant leap forward in language models and it will be interesting to watch as Meta evolves this model further.

How Amazon’s new AI detective tracks down damaged packages before they get to you

Project P.I. illustration

There is nothing worse than receiving an Amazon package only to find that it is broken, not working correctly, or in an imperfect condition. This is especially troublesome when it's an essential product — such as pet food — that you need ASAP. Now Amazon has developed an AI solution to tackle this problem.

Also: I was a Copilot diehard until ChatGPT added these 5 features

On Wednesday, Amazon unveiled Project P.I. (private investigator), which uses generative AI and computer vision to detect product defects before they reach the customer. In addition to checking for product damage, Project P.I. can double-check that the color and size match the customer's order, avoiding misshipments.

Project P.I. is found in fulfillment centers across North America, expanding to additional sites throughout 2024. At these sites, millions of products are scanned in imaging tunnels each month. If a defect is found, such as a bent book cover, Amazon isolates the product to ensure it is not sent to the customer, and investigates whether the issue affects similar items.

Then, Amazon associates review the products flagged by Project P.I. to determine whether they have another use, such as being donated or resold at a discounted price on Amazon's Second Chance site, where Amazon sells open-box and certified refurbished products.

In addition to enhancing manual inspections at fulfillment centers and ensuring users get their products in ideal condition, Amazon notes that this initiative helps create a more sustainable experience by preventing returns that lead to wasted packing materials and unnecessary carbon emissions.

"By leveraging AI and product imaging within our operations facilities, we can efficiently detect potentially damaged products and address more of those issues before they ever reach a customer, which is a win for the customer, our selling partners, and the environment," said Dharmesh Mehta, VP of Worldwide Selling Partner Services at Amazon.

Also: Is ChatGPT down for you? OpenAI's chatbot hit by major outage — here's what we know

Additionally, Amazon is using a multi-modal large language model (LLM) to investigate negative customer experiences. The LLM reviews customer feedback and analyzes images taken from Project P.I. and other sources to confirm the cause of the problem.

Artificial Intelligence

Databricks Acquires Data Management Startup Tabular for Over $1 Billion

Databricks, a leading data and AI company, has announced its acquisition of Tabular, a data management startup founded by Ryan Blue, Daniel Weeks, and Jason Reid.

This strategic move aims to unify the two leading open-source lakehouse formats, Apache Iceberg and Delta Lake, to improve data compatibility and interoperability for enterprises.

The financial terms of the deal remain undisclosed, but industry estimates suggest the acquisition is valued between $1 billion and $2 billion.

A Shared Vision of Openness

Both Databricks and Tabular have a strong commitment to open-source formats. Databricks, known for its contributions to open-source projects, has donated 12 million lines of code and is the largest independent open-source company by revenue. This acquisition underscores Databricks’ dedication to open data formats, ensuring companies maintain control over their data and avoid vendor lock-in.

The Rise of Lakehouse Architecture

Databricks pioneered the lakehouse architecture in 2020, integrating traditional data warehousing with AI workloads on a single, governed copy of data.

This architecture, which relies on open formats, has been widely adopted, with 74% of enterprises deploying a lakehouse according to a survey by MIT Technology Review. The foundation of the lakehouse is open-source data formats that enable ACID transactions on data stored in object storage, improving reliability and performance.

Addressing Format Incompatibility

Despite their shared goals, Delta Lake and Iceberg, the two leading open-source lakehouse formats, have developed independently, leading to incompatibility.

This fragmentation has undermined the value of the lakehouse architecture by siloing enterprise data. Databricks aims to address this issue by working closely with the Iceberg and Delta Lake communities to bring interoperability to the formats.

The introduction of Delta Lake UniForm last year was a step towards this goal, providing compatibility across Delta Lake, Iceberg, and Hudi.

Future Plans and Integration

With the acquisition of Tabular, Databricks plans to invest heavily in expanding the ambitions of Delta Lake UniForm. The integration of Tabular’s technology and expertise will help Databricks enhance its data management platform, enabling companies to leverage AI more effectively.

The acquisition is expected to close in Databricks’ second fiscal quarter, subject to customary closing conditions.

This acquisition marks a significant step in Databricks’ strategy to strengthen its position in the market and offer a more powerful and versatile data management and AI solution to its customers.

The post Databricks Acquires Data Management Startup Tabular for Over $1 Billion appeared first on AIM.

Microsoft Announces $3.2B Investment To Expand Cloud and AI Infrastructure in Sweden

In a major boost to Sweden’s economy and AI landscape, Microsoft has unveiled its largest single investment in the country, which will be invested over the next two years.

The investment of SEK 33.7 billion, or USD $3.2 billion, aims to accelerate Sweden’s adoption of artificial intelligence (AI), enhance workforce skills, and drive long-term economic growth.

Microsoft is significantly expanding its cloud and AI infrastructure by deploying 20,000 advanced GPUs across its data centers in Sandviken, Gävle, and Staffanstorp.

In addition, the company said it is committed to upskilling 250,000 Swedes, representing 2.4% of the population, in AI over the next three years through a variety of technical training, vocational programmes, and expert development courses.

To guide these skill development programs, Microsoft is establishing an AI Insights Council, which will bring together leaders from academia, business, and the public sector. Furthermore, the company is adhering to its AI Asset Principles and the newly unveiled Community Pledge, ensuring that AI innovation is conducted responsibly and has a positive impact on local communities.

“Our investment in Sweden is proof of our confidence in this nation, its government and its potential as a leading player in the AI era,” said Microsoft president Brad Smith.

The tech giant’s commitment extends beyond technology, aiming to provide broad access to AI tools and skills for Sweden’s population and economy to thrive.

Sustainability is a key priority, with Microsoft’s Swedish data centres powered by 100% fossil-free energy and innovative water conservation measures. The company has also invested in nearly 1,000 MW of renewable energy production in Sweden, including a hybrid wind and solar project expected to be operational by 2025.

With this massive investment, Microsoft aims to position Sweden as a global leader in AI innovation and technology, accelerating economic growth and fostering responsible progress in the AI era.
This is one of many expanding foreign investments made by Microsoft in expanding countries’ AI and cloud infrastructure. Just a few months ago, Microsoft made investments of USD $1.7 billion in Indonesia over the next four years to bolster the country’s cloud and AI infrastructures, provide AI skilling opportunities for 840,000 people, and support the nation’s growing developer community.

The post Microsoft Announces $3.2B Investment To Expand Cloud and AI Infrastructure in Sweden appeared first on AIM.

DSC Weekly 4 June 2024

Announcements

  • Cyberattacks are an unfortunate problem for digital business, targeting small companies to the largest enterprises. As digital infrastructure expands and more sensitive information is stored online, security risk management needs must go beyond prevention to ensure the organization has full visibility of their digital environments and can address incidents in real time. Join our upcoming summit to learn A Holistic Approach to Endpoint Detection and Response to get practical EDR strategies to bolster your security strategy. You’ll learn how the principle of least privilege, IoT security and telemetry helps protect your endpoints, and receive advice for using advanced forensics and AI-powered investigations to speed response times.
  • As cloud use continues to expand, proper management, monitoring and security is critical to ensure organizations are gaining the most benefit. Organizations navigating digital transformation initiatives know the cloud is a major playing ground for emerging technologies related to cloud migration, cloud security and containerization. Tune into the Managing Hybrid and Multi Cloud Environments Summit to hear leading experts in the field discuss the latest strategies for monitoring complex cloud environments, preventing cyber attacks and security breaches via the cloud, and how to integrate the latest technology trends into cloud usage.

Top Stories

  • The Early Days of the Internet
    June 4, 2024
    by Dan Wilson
    Join Dan Wilson and guest Wes Kussmaul on the AI Think Tank Podcast as they explore computing in the 1980s, the evolution of online privacy, the birth of social media, and the future of digital trust. Discover how PKI and authentic identities can revolutionize internet security in this insightful episode.
  • Data detective work: An anti-money laundering example
    June 3, 2024
    by Alan Morrison
    I’ve been studying the effects of sanctions lately, which has led to a better understanding of how governments are collaborating and sharing data in more substantial ways. In May 2024 Daleep Singh, US Deputy National Security Advisor, International Economics, gave a keynote at a Brookings Institution event titled “Sanctions on Russia: What’s working? What’s not?” Singh’s main point was that sanctions should be seen as one tool in a toolbox. But he does make clear that sanctions against Russia have had a significant impact over the past two years.
  • New Trends in LLM Architecture
    May 31, 2024
    by Vincent Granville
    Since OpenAI/GPT launched in November 2022, many things have happened. Competitors and new applications are born every month, some raising considerable funding. Search is becoming hot again, this time powered by RAG and LLMs rather than PageRank. It remains to be seen who will achieve profitability on a large scale.

In-Depth

  • Build your own chatbot and talk to your own documents
    June 4, 2024
    by Alan Morrison
    Interview with Jans Aasman, CEO of Franz, Inc. Image by Gerd Altmann from Pixabay Jans Aasman’s AI background goes back to his training as a cognitive scientist at the University of Groningen in the Netherlands beginning in 1978. Jans Aasman, CEO of Franz, Inc. Since then, he’s seen over 40 years of AI’s evolution.
  • Master Data Management (MDM) and CRM: Ensuring data quality for enhanced customer relationships
    June 3, 2024
    by Ovais Naseem
    Data quality has become vital in the digital age, where data shapes decisions and business strategies. Customer Relationship Management (CRM) systems, crucial for managing customer interactions and fostering growth, depend heavily on quality data. Here, the fusion of Master Data Management (MDM) and CRM emerges as a potent force.
  • Making AI pay off at the enterprise edge
    June 3, 2024
    by Alan Morrison
    In this interview, Yaung and I discussed the practical approach that NTT takes with clients from a system integrator’s perspective. This interview took place after Upgrade 2024, an annual event that NTT hosted in San Francisco.
  • Empowering Educators with AI Literacy
    June 3, 2024
    by Dan Wilson
    The rapidly evolving landscape of artificial intelligence (AI) presents both opportunities and challenges for educators. Jill Kowalchuk, a K-12 education advisor at the Alberta Machine Intelligence Institute (AMII), recently delivered an enlightening webinar discussing the impact of AI on education. Her presentation emphasized the importance of AI literacy, ethical considerations, and the empowerment of teachers through professional development.
  • Introduction to autonomous agents from a developer perspective – Part one
    June 2, 2024
    by Ajit Jaokar
    What are autonomous AI agents? Autonomous AI agents are systems capable of performing tasks without human intervention. Agents have been around in various incarnations. Most recently, an element of autonomy was achieved by reinforcement learning(RL). However, it is still hard to deploy RL beyond virtual environments and games.
  • AI Dividend, Universal Basic Income, and Economic Multiplier Effect
    June 2, 2024
    by Bill Schmarzo
    We are truly living in unprecedented times. Artificial Intelligence (AI) is anticipated to transform the global economy by intelligently automating tasks, re-engineering operational processes, and paving the way for new avenues of customer, product, service, and market value creation. According to a report by PwC, AI has the potential to contribute up to $15.7 trillion to the global economy by 2030.
  • How Gen AI is transforming education
    May 30, 2024
    by Pritesh Patel
    In recent years, Artificial Intelligence has experienced numerous innovations. One of the most significant is the introduction of Gen (Generative) AI technology. This Artificial Intelligence technology focuses on creating new, original, and valuable content against the given prompt. That’s the reason why it quickly became popular in numerous fields including Education.

Anthropic Launches Tool Use, Making It Easier To Create Custom AI Assistants

Anthropic has announced the general availability of the new Tool Use feature for its AI assistant Claude. The new feature enables users to build their own AI-driven solutions. The “AI agents” hook up to any external API allowing users to create an email assistant, a bot for online shopping, or other personalized solutions.

The Tool Use feature is available across the entire Claude 3 mode family on the Anthropic Messages API, Amazon Bedrock, and Google Vertex AI. Pricing is based on the volume of text Claude processes, measured in “tokens.”

The new feature also has the capability to work with images to analyze visual data. For example, a virtual interior designer can use the tool to process images remotely and provide personalized solutions.

The evolution of Claude from a basic chatbot to a sophisticated AI assistant capable of mimicking a human assistant is truly remarkable. Not only can the tool be used to automate tasks and provide personalized recommendations, but enterprises can also use it to improve efficiency by extracting structured data from unstructured text and answering customer questions by searching databases or using web APIs.

Tool Use also features built-in functions to improve developer experience. This includes the ability to instruct Claude on tool selection and streaming capabilities for real-time responses in applications such as customer support chatbots.

The introduction of the new feature comes at a time when there are a growing number of AI assistants on the market and new features are being rolled out regularly. OpenAI is transforming its chatbot into a voice assistant, while Google recently introduced various AI-driven features for searching and online shopping.

Anthropic has been reportedly testing beta versions of the new feature for the last couple of months. The company shared success stories through a blog post of a few early adopters of Tool Use.

StudyFetch, an AI-native learning platform, uses Claude's Tool Use feature to power its personalized AI tutor, Spark.E. According to Ryan Trattner, CTO, and Co-Founder at StudyFetch, Tool Use feature has resulted in a “42% increase in positive human feedback” by enabling the platform to act "agentically" to deliver better UI.

Intuned, a browser automation platform, and Hebbia, an LLM platform for knowledge work, have also benefited from Tool Use with better developer experience and more streamlined customer workflows.

The capabilities offered by Tool Use are impressive, but there could be a few key challenges in using the new feature with existing business systems. Companies will have to streamline their systems and CRMs for smooth integration of Tool Use into their ecosystem.

Claude relies on accurate descriptions of tools to determine which tool to apply based on the user input. This means the users will have to clearly describe each tool in the AI model to enable Claude to select the appropriate tool for the task.

While businesses will have to figure out how to leverage the power of Tool Use for their workflows, there is no doubt that the introduction of Tool Use marks a significant step forward in the evolution of human-AI interaction in a business environment. As more companies adopt this new technology, we can expect AI to take on a more autonomous role in the digital ecosystems.

Organizations that can implement AI to augment human capabilities, not replace them, would be best placed to benefit from both, AI and humans. New AI capabilities, such as Tool Use, can handle the more complex, repetitive, and data-heavy tasks, while humans can focus on creative, strategic, and interpersonal tasks.

Related Items

Anthropic Breaks Open the Black Box

Amazon Invests Another $2.75 Billion Into Anthropic

Anthropic in Talks with Menlo Ventures to Raise $750M

Is ChatGPT down for you? OpenAI’s chatbot hit by major outage — here’s what we know

OpenAI ChatGPT GPT-4o

ChatGPT just went down for the second time today due to a major OpenAI outage. The artificial intelligence (AI) chatbot has become a major productivity tool for many users. As the US grinds into the workday this morning, many users are unable to access it.

Also: The best AI chatbots of 2024: ChatGPT, Copilot and worthy alternatives

OpenAI reported a major outage this morning that began around 2:30 a.m. ET, but it was resolved just over five hours later after the company pushed out a fix. Not everyone was affected, however, as reports say it appears to have affected mostly logged-in users. The outage was widespread, affecting the web version and the mobile and Mac apps.

Now, OpenAI's status page reports that ChatGPT is suffering another major outage as of 10:33 a.m. ET, though no updates have been reported.

Also: I was a Copilot diehard until ChatGPT added these 5 features

If you find yourself looking for alternative AI chatbots while ChatGPT is down, here's what you can try:

  • Microsoft Copilot: Touted as the best ChatGPT alternative, Copilot is accessible online and can access internet sources. Copilot also uses GPT-4, which is OpenAI's LLM but isn't down now.
  • Gemini: This is a good opportunity if you haven't tried Google's chatbot yet. This free AI chatbot has access to Google and replies quickly.
  • You.com: This AI bot is powered by one of the most advanced LLMs and is available for free with web access.

Featured

Embracing Future-proof AI: Nandan Nilekani’s Vision for Businesses

Embracing Future-proof AI: Nandan Nilekani's Vision for Businesses

According to Infosys chairman Nandan Nilekani, GenAI offers a great deal of promise to improve productivity and make life easier while reducing the risks connected with this quickly developing technology.

In the company’s annual report, Nilekani stated that corporations must create their apps in compliance with the various rules that govern AI, since regulations are already in place in many countries.

“Given that the leaderboard of technologies will be changing at a bewildering pace, enterprises will have to ‘future proof’ their AI infrastructure. This means designing their AI systems in a way that allows for easy adaptation to new models and technologies, avoiding the risk of being trapped in a technological dead end,” Nilekani said.

“Besides, many of the doomsday prophets pleading for extensive AI regulation have revealed themselves to be just protectionists who want to limit the fruits of GenAI to a few companies and investors,” he said.

A different AI

According to the Infosys co-founder, consumer AI will differ from enterprise AI. The former can be packaged to simplify and increase productivity.

“Unlike the smartphone that brought the magic of apps and touchscreen to billions, consumer AI will push the envelope of usability, convenience, and accessibility for every “no”,” said Nilekani in his letter.

However, there would be a problem with enterprise AI since businesses would need to organise their data inside of their systems so that AI could use it. This can entail controlling data privacy, guaranteeing data quality, and reorganising data formats.

According to Nilekani, ensuring true responses and insights from the data output would also need management, which may require putting in place strong data governance and validation procedures.

Focus on Compliance

AI is now a dominant enterprise technology, and the market for enterprise AI has grown significantly. However, enterprises require more complicated AI-led solutions.

These could include AI-powered ERP software that can streamline tasks and reduce expenses and manual errors, or AI systems that automate mundane tasks, freeing employees and performing root-cause analysis for maintenance problems.

AI has already shaped the ERP landscape significantly, and the ripples can be seen in these types of solutions.

Similar to B2C, the next wave of business AI software makes workers’ lives easier by increasing the orchestration of the knowledge worker labour force.

Instead of you pulling data and looking for it in a bunch of Excel reports, Salesforce reports or websites, it’s being pushed in prepackaged personalised, actionable insights.

Here, everything workers need to know to complete an action is right there, in one single channel through which workers are most likely to engage.

The post Embracing Future-proof AI: Nandan Nilekani’s Vision for Businesses appeared first on AIM.

5 Tips for Writing Better Python Functions

py-func
Image by Author

We all write functions when coding in Python. But do we necessarily write good functions? Well, let’s find out.

Functions in Python let you write modular code. When you have a task you need to perform at multiple places, you can wrap the logic of the task into a Python function. And you can call the function every time you need to perform that specific task. As simple as it seems to get started with Python functions, writing maintainable and performant functions is not so straightforward.

And that’s why we’ll explore a few practices that’ll help you write cleaner and easy-to-maintain Python functions. Let's get started…

1. Write Functions That Do Only One Thing

When writing functions in Python, it's often tempting to put all related tasks into a single function. While this can help you code things up quickly, it’ll only make your code a pain to maintain in the near future. Not only will this make understanding what a function does more difficult but also leads to other issues such as too many parameters (more on that later!).

As a good practice, you should always try to make your function do only one thing—one task—and do that well. But sometimes, for a single task, you may need to work through a series of subtasks. So how do you decide if and how the function should be refactored?

Depending on what the function is trying to do and how complex the task is, you can work out the separation of concerns between subtasks. And then identify a suitable level at which you can refactor the function into multiple functions—each focusing on a specific subtask.

refactor-func
Refactor functions | Image by Author

Here’s an example. Look at the function analyze_and_report_sales:

# fn. to analyze sales data, calculate sales metrics, and write it to a file  def analyze_and_report_sales(data, report_filename):  	total_sales = sum(item['price'] * item['quantity'] for item in data)  	average_sales = total_sales / len(data)        	with open(report_filename, 'w') as report_file:      	    report_file.write(f"Total Sales: {total_sales}n")      	    report_file.write(f"Average Sales: {average_sales}n")        	return total_sales, average_sales  

It's quite easy to see that it can be refactored into two functions: one calculating the sales metrics and another on writing the sales metrics to a file like so:

# refactored into two funcs: one to calculate metrics and another to write sales report  def calculate_sales_metrics(data):  	total_sales = sum(item['price'] * item['quantity'] for item in data)  	average_sales = total_sales / len(data)  	return total_sales, average_sales    def write_sales_report(report_filename, total_sales, average_sales):  	with open(report_filename, 'w') as report_file:      	    report_file.write(f"Total Sales: {total_sales}n")      	    report_file.write(f"Average Sales: {average_sales}n")  

Now it’s easier to debug any concerns with the calculation of sales metrics and file operations separately. And here’s a sample function call:

data = [{'price': 100, 'quantity': 2}, {'price': 200, 'quantity': 1}]  total_sales, average_sales = calculate_sales_metrics(data)  write_sales_report('sales_report.txt', total_sales, average_sales)  

You should be able to see the ‘sales_report.txt’ file in your working directory with the sales metrics. This is a simple example to get started, but this is helpful especially when you're working on more complex functions.

2. Add Type Hints to Improve Maintainability

Python is a dynamically typed language. So you do not need to declare types for the variables you create. But you can add type hints to specify the expected data type for variables. When you define the function, you can add the expected data types for the parameters and the return values.

Because Python does not enforce types at runtime, adding type hints has no effect at runtime. But there still are benefits to using type hints, especially on the maintainability front:

  • Adding type hints to Python functions serves as inline documentation and gives a better idea of what the function does and what values it consumes and returns.
  • When you add type hints to your functions, you can configure your IDE to leverage these type hints. So you’ll get helpful warnings if you try to pass an argument of invalid type in one or more function calls, implement functions whose return values do not match the expected type, and the like. So you can minimize errors upfront.
  • You can optionally use static type checkers like mypy to catch errors earlier rather than letting type mismatches introduce subtle bugs that are difficult to debug.

Here’s a function that processes order details:

# fn. to process orders  def process_orders(orders):  	total_quantity = sum(order['quantity'] for order in orders)  	total_value = sum(order['quantity'] * order['price'] for order in orders)  	return {      	'total_quantity': total_quantity,      	'total_value': total_value  	}  

Now let's add type hints to the function like so:

# modified with type hints  from typing import List, Dict    def process_orders(orders: List[Dict[str, float | int]]) -> Dict[str, float | int]:  	total_quantity = sum(order['quantity'] for order in orders)  	total_value = sum(order['quantity'] * order['price'] for order in orders)  	return {      	'total_quantity': total_quantity,      	'total_value': total_value  	}  

With the modified version, you get to know that the function takes in a list of dictionaries. The keys of the dictionary should all be strings and the values can either be integers or floating point values. The function also returns a dictionary. Let’s take a sample function call:

# Sample data  orders = [  	{'price': 100.0, 'quantity': 2},  	{'price': 50.0, 'quantity': 5},  	{'price': 150.0, 'quantity': 1}  ]    # Sample function call  result = process_orders(orders)  print(result)  

Here's the output:

{'total_quantity': 8, 'total_value': 600.0}  

In this example, type hints help us get a better idea of how the function works. Going forward, we'll add type hints for all the better versions of Python functions we write.

3. Accept Only the Arguments You Actually Need

If you are a beginner or have just started your first dev role, it’s important to think about the different parameters when defining the function signature. It's quite common to introduce additional parameters in the function signature that the function never actually processes.

Ensuring that the function takes in only the arguments that are actually necessary keeps function calls cleaner and more maintainable in general. On a related note, too many parameters in the function signature also make it a pain to maintain. So how do you go about defining easy-to-maintain functions with the right number of parameters?

If you find yourself writing a function signature with a growing number of parameters, the first step is to remove all unused parameters from the signature. If there are too many parameters even after this step, go back to tip #1: break down the task into multiple subtasks and refactor the function into multiple smaller functions. This will help keep the number of parameters in check.

num-params
Keep num_params in check | Image by Author

It’s time for a simple example. Here the function definition to calculate student grades contains the instructor parameter that’s never used:

# takes in an arg that's never used!  def process_student_grades(student_id, grades, course_name, instructor'):  	average_grade = sum(grades) / len(grades)  	return f"Student {student_id} achieved an average grade of {average_grade:.2f} in {course_name}."    

You can rewrite the function without the instructor parameter like so:

# better version!  def process_student_grades(student_id: int, grades: list, course_name: str) -> str:  	average_grade = sum(grades) / len(grades)  	return f"Student {student_id} achieved an average grade of {average_grade:.2f} in {course_name}."    # Usage  student_id = 12345  grades = [85, 90, 75, 88, 92]  course_name = "Mathematics"  result = process_student_grades(student_id, grades, course_name)  print(result)  

Here's the output of the function call:

Student 12345 achieved an average grade of 86.00 in Mathematics.

4. Enforce Keyword-Only Arguments to Minimize Errors

In practice, most Python functions take in multiple arguments. You can pass in arguments to Python functions as positional arguments, keyword arguments, or a mix of both. Read Python Function Arguments: A Definitive Guide for a quick review of function arguments.

Some arguments are naturally positional. But sometimes having function calls containing only positional arguments can be confusing. This is especially true when the function takes in multiple arguments of the same data type, some required and some optional.

If you recall, with positional arguments, the arguments are passed to the parameters in the function signature in the same order in which they appear in the function call. So change in order of arguments can introduce subtle bugs type errors.

It’s often helpful to make optional arguments keyword-only. This also makes adding optional parameters much easier—without breaking existing calls.

Here’s an example. The process_payment function takes in an optional description string:

# example fn. for processing transaction  def process_payment(transaction_id: int, amount: float, currency: str, description: str = None):  	print(f"Processing transaction {transaction_id}...")  	print(f"Amount: {amount} {currency}")  	if description:      		print(f"Description: {description}")  

Say you want to make the optional description a keyword-only argument. Here’s how you can do it:

# enforce keyword-only arguments to minimize errors  # make the optional `description` arg keyword-only  def process_payment(transaction_id: int, amount: float, currency: str, *, description: str = None):  	print(f"Processing transaction {transaction_id}:")  	print(f"Amount: {amount} {currency}")  	if description:      		print(f"Description: {description}")  

Let’s take a sample function call:

process_payment(1234, 100.0, 'USD', description='Payment for services')

This outputs:

Processing transaction 1234...  Amount: 100.0 USD  Description: Payment for services  

Now try passing in all arguments as positional:

# throws error as we try to pass in more positional args than allowed!  process_payment(5678, 150.0, 'EUR', 'Invoice payment')   

You’ll get an error as shown:

Traceback (most recent call last):    File "/home/balapriya/better-fns/tip4.py", line 9, in   	process_payment(1234, 150.0, 'EUR', 'Invoice payment')  TypeError: process_payment() takes 3 positional arguments but 4 were given  

5. Don’t Return Lists From Functions; Use Generators Instead

It's quite common to write Python functions that generate sequences such as a list of values. But as much as possible, you should avoid returning lists from Python functions. Instead you can rewrite them as generator functions. Generators use lazy evaluation; so they yield elements of the sequence on demand rather than computing all the values ahead of time. Read Getting Started with Python Generators for an introduction to how generators work in Python.

As an example, take the following function that generates the Fibonacci sequence up to a certain upper limit:

# returns a list of Fibonacci numbers  def generate_fibonacci_numbers_list(limit):  	fibonacci_numbers = [0, 1]  	while fibonacci_numbers[-1] + fibonacci_numbers[-2] <= limit:      		fibonacci_numbers.append(fibonacci_numbers[-1] + fibonacci_numbers[-2])  	return fibonacci_numbers  

It’s a recursive implementation that’s computationally expensive and populating the list and returning it seems more verbose than necessary. Here’s an improved version of the function that uses generators:

# use generators instead  from typing import Generator    def generate_fibonacci_numbers(limit: int) -> Generator[int, None, None]:  	a, b = 0, 1  	while a <= limit:      		yield a      	a, b = b, a + b

In this case, the function returns a generator object which you can then loop through to get the elements of the sequence:

limit = 100  fibonacci_numbers_generator = generate_fibonacci_numbers(limit)  for num in fibonacci_numbers_generator:  	print(num)  

Here’s the output:

0  1  1  2  3  5  8  13  21  34  55  89  

As you can see, using generators can be much more efficient especially for large input sizes. Also, you can chain multiple generators together, so you can create efficient data processing pipelines with generators.

Wrapping Up

And that’s a wrap. You can find all the code on GitHub. Here’s a review of the different tips we went over:

  • Write functions that do only one thing
  • Add type hints to improve maintainability
  • Accept only the arguments you actually need
  • Enforce keyword-only arguments to minimize errors
  • Don't return lists from functions; use generators instead

I hope you found them helpful! If you aren’t already, try out these practices when writing Python functions. Happy coding!

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

More On This Topic

  • Mastering Python: 7 Strategies for Writing Clear, Organized, and…
  • Statistical Functions in Python
  • Python Lambda Functions, Explained
  • 4 Python Itertools Filter Functions You Probably Didn't Know
  • Announcing a Blog Writing Contest, Winner Gets an NVIDIA GPU!
  • 5 Advanced Tips on Python Sequences

Nobody is as Responsible as Microsoft & Google in AI

Recently, on The Decoder podcast, when asked about OpenAI’s Sora potentially being trained on YouTube videos, Google CEO Sundar Pichai agreed it would be inappropriate and implied that such an action would violate YouTube’s terms and conditions.

“We don’t know the details. Our YouTube team is following up and trying to understand. We have terms and conditions and we would expect people to abide by those terms and conditions,” Pichai added.

In the backdrop of this, AIM looked at the ongoing development of big tech companies in building a responsible AI discussion. Google and Microsoft appear to be making significant strides in AI use and addressing ethical concerns.

From the above scores given by AIM, Microsoft and Google rank the highest in terms of responsible AI

Pay the Dues Where Needed

Pichai, expressing his empathy towards creative content creators, said, “I can understand how emotional a transformation this is, and I think part of the reason you saw even, through Google I/O, when we’re working on products like music generation, we have really taken an approach by which we are working first to make tools for artists. So the way we have taken that approach in many of these cases is to put the creator community as much at the centre of it as possible.”

Exactly a month ago, YouTube CEO Neal Mohan confirmed that using YouTube videos for training AI models violates the platform’s terms of service. However, Mohan couldn’t be sure if OpenAI had indeed used YouTube videos.

“From a creator’s perspective, when they upload their hard work to our platform, they have certain expectations… Lots of creators have different sorts of licensing contracts in terms of their content on our platform,” Mohan said.

Additionally, Pichai added that YouTube is essentially a licensing business where Google licenses a lot of content from creators and pays them back through its advertising model. He said the music industry has a huge licensing relationship with YouTube that is beneficial for both sides.

Contrastingly, last year the New York Times filed a lawsuit against OpenAI, alleging unauthorised use of its published work to train their AI, citing copyright issues related to its written content.

However, OpenAI has since partnered with several news agencies to train its AI models using content from these organisations.

Similarly, Apple has licensed AI for training data from Shutterstock. The deal was closed between $25 million and $50 million for their entire image, video, and music database.

ChatGPT has millions of users.
And now each became even more of a data miner for them.
Especially multimodal data miner. And that is pretty much gold. Just ask Apple. https://t.co/MRrdvvsazr

— Sam Padilla (@theSamPadilla) May 14, 2024

Last year, Apple also began negotiations with major news and publishing organisations, seeking permission to use their material in developing generative AI systems.

Is Openness a Big Factor for Tech Companies?

In a recent Wall Street Journal interview, OpenAI CTO Mira Murati was asked about the kind of data the company had used in Sora. Murati’s response went viral, where she said, “Actually, I am not sure,” elaborating that they had stuck to “publicly available data and licensed data.”

So when *the CTO* of OpenAI is asked if Sora was trained on YouTube videos, she says “actually I’m not sure” and refuses to discuss all further questions about the training data. Either a rather stunning level of ignorance of her own product, or a lie—pretty damning either way! https://t.co/irdbRcmrEp

— Brian Merchant (@bcmerchant) March 14, 2024

With the new GPT-4o model, OpenAI has come under scrutiny due to allegations that it used actress Scarlett Johansson’s voice without permission for one of the model’s voices Sky. The voice was quickly pulled after users noted its striking similarity to Johansson’s voice in the 2013 film Her.

This highlights that OpenAI currently lacks full transparency regarding its training data, although they are gradually improving in this area.

As mentioned before, Open AI recently signed content and product partnership agreements with The Atlantic and Vox Media, helping the artificial intelligence firm boost and train its products.

Also, a few days ago, OpenAI gained access to News Corp publications, granting OpenAI’s chatbots access to new and archived material from the Wall Street Journal, the New York Post, MarketWatch, Barron’s, and others.

This time, closing the deal at $250 million marks a significant increase from just a few months ago, when OpenAI offered a mere $1 million for media licensing to train its extensive language models.

Meanwhile, Meta AI chief Yann LeCun recently confirmed that Meta has obtained $30 billion worth of NVIDIA GPUs to train their AI models. As the necessity of acquiring GPUs, the current AI activities of Meta are all about refining and training more advanced editions of their Llama-3 models.

In doing so, reports also suggest that Meta is considering paying news organisations to better train its AI language model to make its gen AI model including Meta AI more effective and competitive in the market of gen AI.

Meta May Pay News Outlets To Improve AI Training Data Qualityhttps://t.co/MBVa9bzUxM

— TIMES NOW (@TimesNow) May 27, 2024

On a similar note to Microsoft, AI startup Karya employs and pays over 30,000 rural Indians to create high-quality datasets in speech, text, images, and videos for training LLMs in 12 Indian languages.

AI Safety Policies So Far

Recently, OpenAI released its safety policy, which states, “We believe in a balanced, scientific approach where safety measures are integrated into the development process from the outset. This ensures that our AI systems are both innovative and reliable and can deliver benefits to society.”

Similarly, Microsoft developed policies to support responsible capability scaling and collaborated with OpenAI on new frontier models using Azure’s supercomputing infrastructure. They also independently managed a safety review process and joined in OpenAI’s deployment of a safety board to review models, including GPT-4.

While Apple doesn’t have an AI safety policy as such, it seems like they are trying to correct this with the recent hiring plans. Also, Apple would likely partner with OpenAI in the next couple of weeks which can also spur a potential AI safety policy.

At Google Cloud’s Next ’23, VP of Cloud Security Sunil Potti unveiled GCP’s security strategy built on leveraging Mandiant expertise, integrating security into innovations, and providing expertise across environments.

This expands on the Security AI Workbench, introduced in April, with Google’s Sec-PaLM. Potti emphasised generative AI’s potential to tackle evolving threats, tool proliferation, and talent shortages, enhancing security operations in various applications.

Similarly, at AWS, their policy said, “We are committed to developing AI responsibly, taking a people-centric approach that prioritises education, science, and our customers, to integrate responsible AI across the end-to-end AI lifecycle.”

Meanwhile, the responsible AI policy at Meta focuses on five pillars – privacy and security, fairness and inclusion, robustness and safety, transparency and control, and accountability and governance.

Open AI Has a Safety Board, What About the Others?

Recently, OpenAI formed a safety and security committee responsible for making recommendations on critical safety and security decisions for all OpenAI projects. The discussions revolved around the likely early arrival of GPT-5 and how the committee will serve as a safety bunker for OpenAI.

In addition to being led by OpenAI Board directors, the group will also include technical and policy experts to guide them. However, this announcement came right after OpenAI disbanded its super alignment team led by Ilya Sutskever and Jan Leike.

Similarly, as part of safeguarding AI responsibility, Google has established the Responsible AI and Human-Centred Technology (RAI-HCT) team. This team is tasked with conducting research and developing methodologies, technologies, and best practices to ensure that AI systems are built responsibly.

Recently, a Bloomberg report stated that Microsoft has increased its Responsible AI team from 350 to 400 members to ensure the safety of its AI products. Microsoft also released its Responsible AI report, highlighting the creation of 30 responsible AI tools over the past year, the expansion of its Responsible AI team, and the mandate for teams developing generative AI applications to measure and map risks throughout the development cycle.

Additionally, Microsoft has its new member, Inflection AI and DeepMind co-founder Mustafa Suleyman, to ethically steer its AI initiatives.

At last year’s AWS re:Invent conference, AWS’s responsible AI lead Diya Wynn highlighted the importance of using AI responsibly. She emphasised creating a culture of responsibility and a holistic approach to AI within organisations.

She cited a recent survey that shows 77% of respondents are aware of responsible AI, and 59% see it as essential for business. However, younger leaders, aged 18 to 44, are more familiar with the concept than older leaders or those over 45, and only a quarter of respondents have begun developing a responsible AI strategy, with most lacking a dedicated team.

Similar to OpenAI, Meta dispersed its Responsible AI team last year, reallocating members to various groups within the company. However, unlike OpenAI, most team members transitioned to the generative AI sector to continue addressing AI-related harms and support responsible AI development across Meta.

Microsoft Leads The Way

Microsoft has developed extensive policies to support responsible AI, collaborating with OpenAI and independently managing a safety review process, putting them ahead in the race. However, companies like Meta and Google are doing equally as much to help ensure their AI tech is safe and ethically built. Soon, with the tide changing, most companies, including Apple and OpenAI, may strengthen their teams to ensure a responsible approach to AI.

The post Nobody is as Responsible as Microsoft & Google in AI appeared first on AIM.