Remini tops the App Store for its viral ‘AI headshots’ but its body edits go too far, some say

Remini tops the App Store for its viral ‘AI headshots’ but its body edits go too far, some say Sarah Perez @sarahintampa / 9 hours

Instagram’s Threads’ time at the top of the App Store has come to an end, thanks to an AI photo editing app called Remini that’s going viral on TikTok. First launched in 2019, the app added a generative AI feature last year which TikTok users have recently discovered allows them to create professional headshots for sites like LinkedIn just by uploading their own selfies.

In one video by TikTok user @Gracesplace, for example, the creator shows off how she submitted selfies to create a handful of professional-looking headshots that shows her in different outfits and poses. The video has since blown up on TikTok, now topping 42.9 million views.

@gracesplace #CapCut #ai #remini #fypシ ♬ 3:15 (Slowed Down + Reverb) – Russ

As others replicated her experience, demand for the AI photo app sent it soaring into the No. 1 Overall position on the U.S. App Store on July 11th, overtaking Instagram Threads only days after Meta’s popular new Twitter competitor launched, according to data from app analytics firm Apptopia. Over the past 30 days, Remini has seen 22+ million worldwide downloads. As of yesterday, the firm estimates the app was seeing over 20 million daily active users.

Another analytics firm, data.ai, saw Remini takeover Threads on July 10th. In the U.S., Remini saw 2,131,000 U.S. downloads across iOS and Android in the past 10 days, it also found. From June 9-15, data.ai estimates Remini made $3,730,000 in consumer spending, a 1055% week-over-week increase.

On the U.S. App Store, Remini has a 4.6-star rating across 124,000 reviews.

To create the AI headshots, users have to first pick 8 to 12 photos of themselves, including close-up selfies featuring a variety of backgrounds, expressions, and face angles, and then select their gender. You’ll pick out a “model” image that shows the overall style and composition of the photos you want generated. To generate professional shots, there are a number of options available under the subheading “curriculum,” but other styles are available, like “trendy,” “travel,” “casual,” and the questionable “Korean aesthetic,” among others.

The AI will take several minutes to generate the shots. When complete, you can swipe left and right as you view the results — similar to how you like or dislike your possible matches on Tinder.

This process is reminiscent of another AI photo app that also went viral on TikTok last year: Lensa AI, which used your own photos to create “magic avatars” — or artistic renderings of users’ own images. However, that app raised alarm with artists who said the app was using their art as its training data. In addition, users found they could generate NSFW images with the AI.

Similar to Lensa AI, Remini’s app is also a free download but using its full set of features requires a subscription. Currently, it’s priced at $9.99/week or $79.99 per year. (Likely, some users have been working around the payment requirement by starting a free trial, generating their AI images, then canceling their subscription.)

As the Remini headshots trend blew up on TikTok, some users were unhappy with the app’s AI results. Many people found the app was making unwelcome adjustments to their bodies in the resulting photos. For instance, some women remarked the app was making them much skinnier while others noticed it was greatly enhancing the size of their breasts or cleavage.

“All I wanted were some linkedin head shots [sic],” wrote TikToker @juliak528 in her video. “Why did AI give me a boob job,” the video description read, followed by crying emojis.

“No one talks about dangerous this AI trend is for your mental health and body dismorphia [sic],” noted TikToker @spookyhorrorqueen, commenting on how the AI was returning photos showing a differently-sized body.

“The viral AI headshot filter made me about 105lbs and just…its bad,” added another creator, @northstarnotes, after the AI photos made her body smaller.

Others, however, said they were using their AI “skinny” photos as weight loss motivation.

Mian-based Remini was not immediately available for comment regarding whether it was working to address the mistakes the AI was making. (We’ll update if the company responds).

There are other concerns with the app as well, as some users thought the AI results were good enough that it didn’t make sense to pay a photographer for professional photos — something that speaks to the broader implications of easily accessible AI technology.

According to Remini’s website, generative AI photos of yourself aren’t the app’s only selling point. The app can also be used to restore old photos by adding detail, enhance face quality, improve the colors and tones of photos, enhance the background, and more. On the App Store, Remini also promotes it has the option to use an Official Barbie filter on your selfies, to turn your photos into those inspired by the new Barbie movie.

OpenAI Introduces ‘Custom Instructions’ for ChatGPT 

OpenAI just introduced a new feature called ‘custom instructions’, aimed at providing users with enhanced control over ChatGPT’s responses. By incorporating custom instructions, users will have the ability to use fewer prompts, making the interaction process more efficient and user-friendly as ChatGPT will now be able to remember your conversation context based on your chosen preferences, allowing for a more personalized and tailored AI interaction experience.

​​For instance, a teacher could indicate that they are teaching sixth-grade math, or a developer can specify their preferred programming language when seeking suggestions. Users can also provide information about their family size, enabling ChatGPT to provide relevant responses related to meals, grocery shopping, and vacation planning tailored to their specific needs.

Starting today, the beta version of this feature will be accessible to users with the Plus plan, and it will gradually expand to include all users in the following weeks.

Interestingly, this update coincides with OpenAI’s decision to double the number of messages that ChatGPT Plus customers can send to GPT-4 in a given time frame. Starting next week, users will be able to send up to 50 messages per 3 hours, further expanding the possibilities for AI-powered conversations.

Notably, the company said it will use custom instructions to improve model performance for users. However, “you can disable this via data controls”, the company added.

Plus users can start using custom instructions today by opting into the beta for custom instructions:

On the web, click on your name → Settings → Beta features → opt into Custom instructions. Custom instructions will appear in the menu when you click on your name going forward.

On iOS, go to Settings → New Features → turn on Custom instructions. Custom instructions will appear in settings.

The post OpenAI Introduces ‘Custom Instructions’ for ChatGPT appeared first on Analytics India Magazine.

How much coding is needed in a data science career?

How much coding is needed in a data science career?

The most common question in people’s minds that are not from a technical background is how much coding is required to ace a data science career path. If you also have the same question, you are not alone. But, the surprising answer is “it depends”. Unarguably, coding is a crucial aspect and vital tool for data scientists. But, depending on the data science professional role or the company, coding may not require. In contrast to popular belief, coding is not a prerequisite for data science.

If you are curious whether a data scientist needs coding skills, continue reading this blog until the end. This blog will explore the answer to this question and discover its significance.

Does data science require coding?

Data science roles traditionally require programming skills. Most experienced data scientists still code today. Data scientists are still growing, but the data landscape is changing. Now, technologies enable people to work on data projects without typing code.

These technologies are not intended to replace data science skills in coding but rather to make data analysis more accessible to those with less technical skills. When used in the planned way, these technologies allow data scientists to continue using code for complex and bespoke solutions.

What are the programming languages used in data science?

It is crucial to have a basic understanding of each data science language. So, take a closer look at each language:

1. Python

Python is a highly used programming language in data science. Scientists do not use Python for scientific research. Python is a powerful tool for machine learning, data analytics, and data visualization. Also, it is widely used in other software engineering applications. It can be almost as easy to learn Python as reading and writing in English. Python’s open-source nature and this characteristic make it a popular coding for data scientists. Data scientists and other tech professionals use it extensively.

Data scientists are often in need of saving time through automation. Python language is a fantastic tool for automating tasks.

2. R

R is a scripting language that is:

· Open-source

· Widely supported

R can be an excellent tool for data scientists managing large, complex data sets. R is the best language for data scientists who combine statistical computing with mathematics and graphics. This language provides its programmers with extensive packages, libraries, and other tools suitable for quantitative applications.

3. SQL

According to Zdnet.com, SQL is second in importance for data scientists after Python. Being familiar with this language is essential because the industry uses it for interfacing with relational database systems. Data science professionals need to be able to query databases. For aspiring data scientists, it’s essential to have a basic understanding of SQL. Data scientists are often required to use this language when dealing with structured data. Data scientists can write SQL queries or scripts to automate tasks like:

· aggregating data

· Calculating averages

· Find the maximum and minimum value in a dataset

SQL can also be used to store data in databases or extract data from databases.

4. Java

Data scientists may choose to use Java to perform tasks related to:

· Machine learning

· Data analysis

· Data mining

This is a good choice for cases when these applications are to be integrated into larger development projects. Java offers many libraries for data mining, machine learning and other applications. Scala is an extension of the Java programming language. It increases the ability of data scientists to manipulate large datasets. Scala offers an extensive set of valuable and well-supported libraries.

5. C/C++

C/C++ is codebasing for many of today’s programming languages. This means that a data scientist should have a solid foundation in C. C/C++ also offers advantages such as the ability to compile data quickly and efficiently. C/C++ is a good choice for data scientists on projects needing high performance or massive scalability.

Benefits of coding in data science

Data scientists should have a solid understanding of coding, an essential part of the data science process. Data science programming requires in-depth knowledge of computer science, mathematics and statistics concepts and techniques. Data science coding enables you to build robust algorithms to automate and solve complex problems.

So, the main benefits of learning to code for the data scientist role are:

· Data science coding allows faster and more accurate analysis of large datasets than manual methods.

· Data science allows you to automate repetitive tasks and free up time for other aspects of your job.

· Data science coding helps you to obtain better insight into data and offers more access to the manipulation and analysis of data.

This guide will provide data scientists some time-saving tips, tricks, and hacks : An Ultimate Guide on 21 Powerful Tips, Tricks, And Hacks for Data Scientists

Conclusion

You must have the right attitude and skills to be a successful data scientist. Data scientists must be experts in data-science programming, have a strong understanding of maths and statistics, and maintain a consistent workflow.

They must also be able to visualize their data to make their insights easier to understand. Coding is only one aspect of becoming a data scientist. These challenges may seem overwhelming, but with the right help and resources, anyone can become a successful Data Scientist.

Difference Between Modern and Traditional Data Quality – DQLabs

Modern and Traditional data quality_v1

Modern data quality practices make use of new technology, automation, and machine learning to handle a variety of data sources, ensure real-time processing, and stimulate stakeholder collaboration. Data governance, continuous monitoring, and proactive management are prioritized to ensure accurate, reliable, and fit-for-purpose data for informed decision-making and corporate success.

Modern data quality practices differ from traditional data quality approaches in several ways:

  • Data Sources and Types:
    Traditional data quality primarily focused on structured data from internal systems or databases.
    Modern data quality practices encompass a wide range of data sources, including unstructured data, external data, social media data, IoT data, and more. The variety of data types and sources has expanded significantly in the modern data landscape.
  • Scale and Volume:
    With the advent of big data and increased data generation, modern data quality practices address the challenges of processing and managing massive volumes of data. Traditional approaches were not designed to handle such scale, whereas modern practices leverage technologies like distributed processing and cloud computing to efficiently manage and analyze large datasets.
  • Real-time and Near-real-time Processing:
    Traditional data quality processes often operated in batch mode, with periodic data cleansing and validation. Modern data quality emphasizes real-time or near-real-time processing, enabling organizations to detect and address data quality issues as they occur. This is crucial in dynamic environments where data is constantly changing and requires immediate attention.
  • Automation and Machine Learning:
    Modern data quality practices leverage automation and machine learning techniques to enhance data quality processes. Automation enables the efficient execution of repetitive tasks such as data cleansing, validation, and standardization. Machine learning algorithms can learn patterns and anomalies in data, enabling automated detection of data quality issues and predictive data quality management.
  • Data Governance and Data Stewardship:
    Modern data quality recognizes the importance of data governance and data stewardship as fundamental components of data quality management. Data governance frameworks establish policies, procedures, and responsibilities for managing data quality throughout the organization. Data stewards are assigned to ensure adherence to these policies and to drive data quality initiatives.
  • Collaboration and Cross-functional Involvement:
    Unlike traditional approaches where data quality was primarily an IT function, modern data quality practices involve collaboration among various stakeholders. This includes business users, data analysts, data scientists, and subject matter experts. Collaboration ensures that data quality requirements are aligned with business needs and that data quality efforts address the specific goals of different departments or projects.
  • Data Quality as a Continuous Process:
    Modern data quality practices emphasize the concept of continuous data quality management. Rather than treating data quality as a one-time activity, organizations continuously monitor, measure, and improve data quality. This involves ongoing data profiling, validation, data quality monitoring, and feedback loops to ensure sustained data quality over time.

Overall, modern data quality practices adapt to the changing data landscape, incorporating new data types, handling larger volumes of data, and leveraging automation and advanced analytics. They prioritize real-time processing, collaboration, and continuous improvement to ensure high-quality data that supports informed decision-making and business success.

Google Bard Becomes More Available, Refinable, Shareable and Visual

An image with the updates to Google Bard.
Image: Andy Wolber/TechRepublic

In mid-July 2023, Bard became available to adults with Google accounts in more places, including Brazil and everywhere in the European Union. Bard is Google’s experimental entry in the competitive world of conversational AI systems such as OpenAI’s ChatGPT, Anthropic’s Claude and Inflection AI’s Pi.

To try Bard, sign up in a browser with a Google Account from a supported country. If you use a Google Workspace account, you may want to check with your administrator because some organizations may limit or prohibit access. Once signed in to Bard, you may explore the features detailed below.

Jump to:

  • Chat with Bard or listen to a response in more than 40 languages
  • Refine a Bard response quickly
  • Pin and share Bard conversations
  • Prompt Bard with an image

Chat with Bard or listen to a response in more than 40 languages

Bard can chat and respond in more than 40 languages as of mid-July 2023. The 10 most widely spoken languages in which Bard is available are:

  • Chinese (Simplified and Traditional)
  • English
  • Hindi
  • Spanish
  • Standard Arabic
  • Bengali
  • French
  • Russian
  • Portuguese
  • Japanese

The above list is ordered roughly by the total number of speakers of each language. Bard includes speech-to-text and text-to-speech support (Figure A).

Figure A

Select the speaker (indicated with a circle) to have Bard read a response. Select the microphone (indicated with a square) to speak your prompt.
Select the speaker (indicated with a circle) to have Bard read a response. Select the microphone (indicated with a square) to speak your prompt.

To prompt Bard with your voice, make sure the input box is active (the cursor is in it), then choose the microphone icon and talk to enter your prompt. If necessary, edit the text, then select the submit icon.

To have Bard read a response (text-to-speech), select the speaker icon in the upper-right corner near the View Other Drafts menu. Select the speaker icon again to pause the reading.

Refine a Bard response quickly

Bard usually offers a quick way to refine a response: Select the Modify Response button (Figure B) and choose one of the five options:

  • Shorter
  • Longer
  • Simpler
  • More casual
  • More professional

Figure B

Select the Modify Response button and then choose how you want Bard to adjust a response.
Select the Modify Response button and then choose how you want Bard to adjust a response.

The first two options, Shorter and Longer, modify the length of the response, while the other three modify the tone of the response. The Simpler option tends to omit some explanatory text, while More casual generates more informal language, such as what you might use with a friendly colleague. More professional tilts toward the use of active voice and more specific or technical terms. Bard tags on a brief explanation of the changes made, so you may want to omit that explanation if you copy the content.

Pin and share Bard conversations

How to pin Bard conversations

You can pin any conversation (Figure C) to place it above the list of all recent Bard chats.

Figure C

To the right of a recent chat, select the More menu (circled), then Pin to make a conversation easier to access. Pinned chats display first in the list.
To the right of a recent chat, select the More menu (circled), then Pin to make a conversation easier to access. Pinned chats display first in the list.

Separating a few chats helps you continue a Bard conversation thread on a particular topic, because different topics rely on a distinct series of prompts and responses. For example, a series of prompts exploring machine learning will be unrelated to a prompt series about remote work policies. Pin each different chat and then, when you return, select the appropriately pinned conversation and continue with your next prompt.

How to share Bard conversations

You may now create and share a public link to your Bard chats. Select the share icon and the Share option (Figure D) and then choose whether to link to a single Bard prompt-and-response or an entire conversation.

Figure D

Select the share icon and then the Share option to create a public link to a conversation or a specific prompt-and-response.
Select the share icon and then the Share option to create a public link to a conversation or a specific prompt-and-response.

You also may modify the display headline of your shared content. The publicly accessible link displays Bard content as originally formatted including images. This is different from when you choose to copy a response (i.e., More menu | Copy), and the system omits images. This Share option lets you show people exactly what’s possible with Bard.

Prompt Bard with an image

Bard now accepts uploaded JPEG, PNG or WebP images along with prompts (Figure E).

Figure E

Add an image and leverage the power of Google Lens as part of your prompting process.
Add an image and leverage the power of Google Lens as part of your prompting process.

Upload an image and then reference it in the prompt by asking Bard to identify places or things, extract text or search for information relevant to the image. For example, you might upload a photo of a book and then ask for more information about it. Bard often can translate text in images or suggest captions.

Google notes that the system leverages the capabilities of Google Lens. Image capabilities are available only in English editions of Bard, for now.

If you have access to Bard and have experimented with it, message or mention me on Mastodon (@awolber) to share your experience with Bard with me.

Person using a laptop computer.

Subscribe to the TechRepublic UK Newsletter

Catch up on the week’s essential technology news, must-read posts, and discussions that would be of interest to IT pros working in the UK and Europe.

Delivered Wednesdays Sign up today

Unlock DataOps Success with DataOps.live: Featured in Gartner Market Guide!

Sponsored Post

Unlock DataOps Success with DataOps.live - Featured in Gartner Market Guide!

We have fantastic news to share with you! DataOps.live has been featured in the highly esteemed Gartner Market Guide for DataOps Tools, released on December 5, 2022. This recognition signifies a major milestone in the DataOps market and solidifies DataOps.live as an essential player in this evolving landscape.

Gartner Market Guides are invaluable resources for staying ahead of emerging market trends. With over 100 Market Guide research notes, Gartner provides strategic leaders like you with a comprehensive view of various markets, both mature and smaller, in an easily digestible format.

If you are utilizing Snowflake to build your data infrastructure, applications, data products, and analytic frameworks, it is essential to familiarize yourself with DataOps. Understanding the methodology, requirements, and best practices of DataOps is crucial because it brings the same transformative effects to DataOps as DevOps did to software applications.

To learn more about the DataOps market, download your free copy of the Gartner Market Guide for DataOps Tools.

Download Gartner Market Guide

More On This Topic

  • Unlock DataOps Success with DataOps.live — Featured in Gartner Market…
  • DataOps: 5 things that you need to know
  • Should You Consider a DataOps Career?
  • DataOps Summit 2021 CFP Is Now Open!
  • Unleashing the Power of MLOps and DataOps in Data Science
  • StreamSets DataOps Platform — Summer ‘21 Public Beta. Sign up today!

Google pitched a news-writing AI tool to New York Times, Washington Post

Google building

According to an article from the New York Times, Google is testing an AI tool that can write news content. The article states that Google demonstrated the product to executives at the New York Times, The Washington Post, and News Corp, the owner of The Wall Street Journal.

Google named the product Genesis, though the name is an internal title. Genesis can collect information about current events and produce news articles. According to The New York Times, people familiar with the tool say Google is marketing it as a supplemental tool to journalists, not a replacement for human-written articles.

Also: 5 ways to explore the use of generative AI at work

Allegedly, some executives were not impressed with the tool and believed Google was trying to downplay the knowledge and effort it takes to be a journalist.

"Quite simply, these tools are not intended to, and cannot, replace the essential role journalists have in reporting, creating, and fact-checking their articles," Jenn Crider, a Google spokesperson, said in a statement.

Google says the AI tool could help journalists generate headlines and help them strengthen their writing styles.

Also: This is how generative AI will change the gig economy for the better

Some newsrooms have been using AI-generated content for years. For example, The Associated Press uses AI to automate stories that cover corporate earning reports and some sporting events. The Associated Press also uses AI technology to help transcribe audio and video from live events.

But recently, publications that use AI to generate stories that should be reviewed and written by humans received fierce backlash for publishing mistake-ridden, AI-generated content.

Journalism relies on journalists to be knowledgeable and ethical when reporting on the news. Generative AI tools cannot be relied on to do the same, as they can hallucinate or output misinformation.

Also: Need an AI essay writer? Here's how ChatGPT (and other chatbots) can help

Google's Bard, the company's competitor to OpenAI's ChatGPT, delivered false information to some users. Although hallucinations are a side effect of the technology in its current form, newsrooms with a prestigious reputation to preserve will likely be hesitant to adopt it widely.

Artificial Intelligence

Whop, an online marketplace for digital goods, raises $17M

Whop, an online marketplace for digital goods, raises $17M Kyle Wiggers 8 hours

Amazon might have the monopoly on physical goods sold online. But what about digital ones?

There’s Steam for games and software. Shopify supports some forms of digital goods, like artwork and gig services. But three co-founders, Steven Schwartz, Cameron Zoub and Jack Sharkey, believe that there’s room for competition.

Schwartz, Zoub and Sharkey are the creators of Whop, a marketplace for people to sell access to digital products. Products for sale — and re-sale — run the gamut from sports gambling picks and deals on food, travel and credit cards to tips to “level up your social game.”

“Whop is a comprehensive online platform aimed at connecting sellers and buyers within the digital economy,” Schwartz told TechCrunch in an email interview. “Its mission is to centralize all products on the internet, offering a one-stop solution for anyone looking to participate in the digital economy.”

Zoub and Schwartz met when they were 13 years old in a Facebook group over a shared interest in limited-edition sneakers. Together, they launched one of the first “sneaker bots” — software to get shoes before they sold out — and used the profits to bootstrap the creation of more software to sell online.

After teaming up with Sharkey, a software developer, to build products for small businesses, Zoub and Schwartz created a makeshift marketplace where people could buy software — and sell their own — for free. But scammers overran it.

“It was trash,” Schwartz said bluntly. “People had to make forum posts and often got scammed, a middleman was required and pricing for the software wasn’t clear.”

So Schwartz, Zoub and Sharkey began working on an improved version of the marketplace, which became Whop.

“We’re creating a new economy, giving people new things to sell,” Schwartz said. “We see ourselves as competing with social media, where people have traditionally gone to sell their software, but suffer through an incredibly suboptimal experience.”

Whop

Whop’s online marketplace for digital goods and services selling.

Given the countless goods and services marketplaces out there, one might wonder what makes Whop — besides the amusing name — different. (This writer did.) Schwartz claims that Whop is differentiated by its selling experience and product discovery engine.

Sellers on Whop get a dashboard with promotion and customer relationship management tools as well as analytics for business insights. As for buyers, they’re treated to a recommendation algorithm, visualizations for discovering new products and a portal for managing their purchases.

Sound par for the course? Perhaps. But Whop’s going after a different audience than your typical marketplace: influencers and content creators.

“People are selling sponsorships or ad space on traditional social channels, but now they can use Whop to offer their audience a real, living and breathing product that they can collect a recurring, stable income stream on,” Schwartz said. “If someone has a million followers, they think they’re better off continuing to post content, but in reality, someone who has 20,000 followers with a real product is actually making more money than them.”

Is there truth to that last statement? Perhaps. What does appear to be a trend is that buyers are more likely to buy a product that’s recommended to them by an influencer they trust. According to one source, 49% of consumers depend on influencer recommendations, while 40% say that they’re likely to purchase something after seeing it on Twitter, YouTube or Instagram.

In my cursory browsing, a lot of Whop’s listings seem to revolve around sports betting, crypto and general wealth-growing strategies. There’s nothing wrong with that, necessarily. But I wonder if it all has staying power — and how carefully it’s moderated.

Whop

Whop provides analytics to sellers on the backend.

As with all marketplaces, there’s a risk, also, that bad actors manipulate the platform to drive sales to scammy products — whether through fake reviews or dubious SEO practices. Whop says that it takes steps to mitigate this, but it’s tough to know how extensive those steps are, particularly considering Whop’s small team (20 people).

But investors see potential in what Whop’s doing. The company today announced that it raised $17 million in a Series A round with participation from Insight Partners, The Chainsmokers, Peter Thiel and others. The tranche values the startup at over $100 million — a healthy valuation for a marketplace of around a million customers and 3,000 sellers that’s done $100 million in transactions to date.

“We have loads of runway,” Schwartz said. “The pandemic and slowdown in tech have been incredible for us; the slowdown in venture in particular has resulted in numerous small-scale, cash-focused products, which is exactly what our product supports.”

Exploring the Power and Limitations of GPT-4

XXXXX
Image by Author

Ever wondered how the newer, more advanced GPT-4 compares to its predecessor, ChatGPT 3.5?

What is it about GPT-4 that makes it stand out, and where does it reach its limitations?

As we delve into the limitations of language models and their potential applications, it's crucial to comprehend these nuances.

In this journey through the landscape of artificial intelligence, we’ll highlight the distinctive features of ChatGPT 4, its innovative plug-ins, and how they are revolutionizing the way we solve technical questions and problems in the field of data science.

From addressing the complexity of statistics and probability to wrestling with complex modeling problems and coding queries in Python, SQL, or R, we'll explore all the ways GPT-4 can assist in data science.

So, buckle up as we embark on this explorative journey, peeling back the layers of GPT-4 and its capacity to reshape our approach to data science!

ChatGPT 3.5 and ChatGPT 4

GPT-3.5, also known as ChatGPT, was made public by OpenAI in November 2022.

It became popular immediately worldwide. It even became so popular that it exceeded server capacity at times, which led OpenAI to introduce a paid version, ChatGPT Plus, to manage the demand.

Moving on from GPT-3.5, GPT-4 was released on 14 March, and it has a significant upgrade.

It has advanced reasoning capabilities, and even if it can generate meaning from data instead of text (like pictures), the current version of ChatGPT+ is still text output only.

The problem-solving capability of GPT-4 is more accurate than GPT-3.5.

Currently, the paid version of ChatGPT+ costs $20 per month. Yet it has its restrictions, like the cap of 25 messages every 3 hours.

Also, Microsoft’s search engine Bing has been running a customized version of GPT-4.

For more information, here is the official blog post about GPT-4 on the OpenAI website.

ChatGPT Plug-ins XXXXX
Source: https://chat.openai.com/

OpenAI unveiled a new feature for ChatGPT — plugins — on March 23.

These tools are designed to augment the language model's capabilities, allowing it to access the most recent information and perform computations.

The rollout of these plugins is being done gradually to assess their practical use, impact, and potential safety and alignment issues.

Various companies, including KAYAK, Klarna, Milo, Shopify, Slack, Speak, Wolfram, Zapier, and more, have developed the initial set of plugins.

These plugins offer a broad spectrum of functionalities, such as trip planning, grocery ordering, computation access, and real-time data retrieval.

OpenAI also provides two plugins: a web browser and a code interpreter.

While the web browser allows you to reach the web with the power of the GPT model, a code interpreter will help you manipulate and analyze your data by just prompting.

Some of these plug-ins, like Noteable and Wolfram, are especially useful in data science.

These plugins are intended to increase the utility of ChatGPT.

Can GPT4 Be Used to Answer Technical Questions in Data Science?

We can divide data science questions into non-coding and coding interview questions.

By testing ChatGPT4’s ability in these sections, we’ll better understand how ChatGPT4 can be used to answer technical questions in data science.

Let’s start with the non-coding questions.

Non-coding Questions

Here, we will divide the non-coding questions into statistics, probability, and modeling.

XXXXX
Image by Author

Statistics

Statistics is vital to data science, encompassing data gathering, analysis, interpretation, display, and organization.

GPT-4 is equipped to answer questions about statistical concepts, offering a comprehensive understanding of complex statistical principles.

Using the GPT-4 this way can improve the understanding of statistical theories and empower data scientists to extract meaningful insights from data.

XXXXX

Here is the statistics question that we’ll use for testing the Chat’s abilities. We are going to use the Wolfram plug-in, which assists users when doing calculations.

Also, one important note when using ChatGPT-4 and its plugins: They might be good, but they’re not perfect so, always, always double-check the output because it can be wrong.

Prompt: Why is MSE frequently used as a loss function for Linear Regression?

(Use Wolfram Plug-in, provide formulas)

ChatGPT : The formula for MSE is:

XXXXX

where:

  • Equation is the number of observations
  • Equation is the actual value and
  • Equation is the predicted value.

Here are the reasons why MSE is commonly used:

  1. Differentiability: MSE is a smooth and differentiable function. This is important because optimization algorithms, like gradient descent, rely on the function's derivative to find the minimum.
  2. Sensitivity to outliers: The squaring term in MSE gives higher weight to larger errors. This means the model will be more sensitive to outliers and will try to minimize large errors.
  3. Interpretability: MSE is easy to interpret as it represents the average squared difference between the predicted and actual values. A lower MSE indicates a better fit of the model to the data.
  4. Convexity: MSE leads to a convex loss landscape. Convex problems have a single global minimum, making them easier to solve using optimization algorithms.
  5. Consistency with OLS: In the case of linear regression, minimizing the MSE is equivalent to minimizing the residuals in Ordinary Least Squares (OLS), which is the standard method for estimating the parameters of a linear regression model.

Note from the author: Even though MSE has these advantages, it's not always the best choice for every regression problem. Depending on the specific characteristics of the data and the problem at hand, other loss functions might be more appropriate.

Control

The ChatGPT’s answer is detailed and correct.

Probability

Probability is integral to data science as it offers a structure for making data predictions and comprehending uncertainty. It can illustrate these concepts in detail, explaining the mathematical computations and logic behind the likelihood of a specific event.

This assists data scientists in building models and making intelligent decisions grounded in probabilistic reasoning.

XXXXX

Here is the probability question ChatGPT will try to solve.

Prompt: You are playing a game with a friend where you roll two fair six-sided dice. If the sum of the two dice is 7 or 11, you win; otherwise, your friend wins. What is the probability that you win on your first roll?

ChatGPT: To determine the probability of winning on your first roll, we need to calculate the number of favorable outcomes (winning outcomes) and divide it by the total number of possible outcomes.

In this case, there are 6 possible outcomes for each die since they are fair six-sided dice. Therefore, the total number of outcomes when rolling two dice is: 66 =36

To win on the first roll, the sum of the two dice must be either 7 or 11. We can calculate the number of favorable outcomes as follows:

  1. Sum of 7: There are 6 ways to achieve this outcome: (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), and (6, 1).
  2. Sum of 11: There are 2 ways to achieve this outcome: (5, 6) and (6, 5).

Therefore, there are a total of 6 + 2 = 8 favorable outcomes.

Now, we can calculate the probability of winning on the first roll by dividing the number of favorable outcomes by the total number of possible outcomes:

Probability of winning on the first roll = Number of favorable outcomes / Total Numbers of Possible Outcomes = 8 / 22 = 2 / 9 = 0,222

So, the probability that you win on your first roll is approximately 0.222 or 22.2%.

Control

The output is correct and matches the solution from the platform.

Modeling

Modeling is the process of forming a mathematical depiction of a system to analyze and forecast behavior.

It can help data scientists comprehend concepts, assumptions, implementation, and interpretation of their results. This leads to improved predictions and comprehension of the data.

XXXXX

Here is the modeling question we are going to test.

Prompt : How does the K-means algorithm work?

ChatGPT : The K-means algorithm is an iterative clustering algorithm that aims to partition a given dataset into K clusters, where K is a user-defined parameter.

It works as follows:

  1. Initialization: Randomly select K data points from the dataset as the initial centroids (representative points) of the clusters.
  2. Assignment: Assign each data point to the nearest centroid based on a distance metric, usually the Euclidean distance. This step forms K clusters.
  3. Update: Recalculate the centroids of the clusters by taking the mean of all data points assigned to each cluster. This step aims to find better centroids that represent the center of each cluster.
  4. Iteration: Repeat the assignment and update steps iteratively until convergence. Convergence occurs when either the centroids stop changing significantly or the maximum number of iterations is reached.
  5. Output: The algorithm returns the final K clusters, where each data point belongs to the cluster with the nearest centroid.

The K-means algorithm seeks to minimize the within-cluster sum of squared distances, also known as the inertia or the total intra-cluster variance. It assumes that the clusters are spherical and have equal variance. However, the algorithm may converge to suboptimal solutions depending on the initialization of centroids.

To enhance the performance of K-means, various techniques can be applied, such as using multiple initializations, performing smarter centroid initialization methods (e.g., K-means++), or using the Elbow method to determine the optimal value of K.

It's important to note that K-means is sensitive to the initial choice of centroids and may struggle with datasets containing non-linear or irregularly shaped clusters. In such cases, alterntive clustering algorithms like DBSCAN or Gaussian Mixture Models (GMM) may be more suitable.

Control

The ChatGPT’s answer is detailed and correct.

Coding Questions

Coding is mandatory for data scientists. It includes using Python for data wrangling and machine learning, SQL for database handling, and R for statistical computations. GPT-4 can assist in resolving coding issues.

XXXXX
Image by Author

Let’s test ChatGPT-4 on coding questions, starting with Python.

Python

Python is a favored language in the data science community due to its user-friendly nature and the presence of robust libraries such as pandas, numpy, and scikit-learn.

GPT-4 can assist in debugging Python code, recommending best coding practices, and even producing code snippets for particular tasks.

Now let’s try it with the following question.

XXXXX

Question 1

Here is the question.

Prompt:

My dataset name is airbnb_search_details and here is more information;

Unnamed: 0:int64

id:int64

price:float64

property_type: object

room_type: object

amenities:object

accommodates:int64

bathrooms:int64

bed_type:object

cancellation_policy:object

cleaning_fee:bool

city:object

host_identity_verified:object

host_response_rate:object

host_since:datetime64[ns]

neighbourhood:object

number_of_reviews:int64

review_scores_rating:float64

zipcode:int64

bedrooms:int64

beds:int64

The question is: Find all searches for accommodations where the number of bedrooms is equal to the number of bathrooms. Make sure that the expected output type is a dataframe.

Note from the author: To feed the dataset into the Chat, just copy it from the platform like shown below.

ChatGPT: To find all searches for accommodations where the number of bedrooms is equal to the number of bathrooms, you can use the following code in Python:

import pandas as pd  # Assuming 'airbnb_search_details' is the DataFrame containing your dataset  # Replace 'airbnb_search_details' with the actual name of your DataFrame if different  # Filter the DataFrame based on the condition: number of bedrooms == number of bathrooms  result = airbnb_search_details[airbnb_search_details['bedrooms'] == airbnb_search_details['bathrooms']]  # Display the resulting DataFrame  print(result)  

Control

Actually, most of ChatGPT’s answer is true except then the last line. There is no need to print out the result dataframe using the print function, so the solution works when the code is updated as follows.

import pandas as pd  # Assuming 'airbnb_search_details' is the DataFrame containing your dataset  # Replace 'airbnb_search_details' with the actual name of your DataFrame if different  # Filter the DataFrame based on the condition: number of bedrooms == number of bathrooms  result = airbnb_search_details[airbnb_search_details['bedrooms'] == airbnb_search_details['bathrooms']]  # Display the resulting DataFrame  result

Question 2

Here is the question from LeetCode I’ll feed into the ChatGPT-4.

XXXXX

I am going to basically copy and paste all the information in my prompt.

Prompt:

Given an integer array nums and an integer val, remove all occurrences of val in nums in-place. The order of the elements may be changed. Then return the number of elements in nums which are not equal to val.

Consider the number of elements in nums which are not equal to val be k, to get accepted, you need to do the following things:

Change the array nums such that the first k elements of nums contain the elements which are not equal to val. The remaining elements of nums are not important as well as the size of nums.

Return k.

Custom Judge:

The judge will test your solution with the following code:

int[] nums = […]; // Input array

int val = …; // Value to remove

int[] expectedNums = […]; // The expected answer with correct length.

// It is sorted with no values equaling val.

int k = removeElement(nums, val); // Calls your implementation

assert k == expectedNums.length;

sort(nums, 0, k); // Sort the first k elements of nums

for (int i = 0; i < actualLength; i++) {

assert nums[i] == expectedNums[i];

}

If all assertions pass, then your solution will be accepted.

Example 1:

Input: nums = [3,2,2,3], val = 3

Output: 2, nums = [2,2,_,_]

Explanation: Your function should return k = 2, with the first two elements of nums being 2.

It does not matter what you leave beyond the returned k (hence they are underscores).

Example 2:

Input: nums = [0,1,2,2,3,0,4,2], val = 2

Output: 5, nums = [0,1,4,0,3,_,_,_]

Explanation: Your function should return k = 5, with the first five elements of nums containing 0, 0, 1, 3, and 4.

Note that the five elements can be returned in any order.

It does not matter what you leave beyond the returned k (hence they are underscores).

Constraints:

0 <= nums.length <= 100

0 <= nums[i] <= 50

0 <= val <= 100

The solution format ;

class Solution:

def removeElement(self, nums: List[int], val: int) -> int:

ChatGPT: Here's the solution in Python:

class Solution:      def removeElement(self, nums: List[int], val: int) -> int:          k = 0  # Counter for elements not equal to val                    # Iterate through the array          for i in range(len(nums)):              if nums[i] != val:                  nums[k] = nums[i]  # Move the element to the kth position                  k += 1                    return k  

The solution uses a two-pointer approach to modify the nums array in-place and count the number of elements that are not equal to val. The variable k is used to keep track of the position where the next non-val element should be placed.

The algorithm iterates through the array and whenever it encounters an element that is not equal to val, it is moved to the kth position, and k is incremented. In the end, k represents the number of elements not equal to val.

You can test the solution using the provided custom judge code to verify if it produces the expected output for the given examples and passes the assertions.

Control

The code is working perfectly without needing any updates.

SQL

SQL is vital for working with databases. GPT-4 can help formulate SQL queries, enhance database efficiency, and grasp intricate SQL concepts.

Question 1

Here is the SQL question we’ll try to solve.

XXXXX

Prompt : Here is the question: Following a recent advertising campaign, the marketing department wishes to classify its efforts based on the total number of units sold for each product.

You have been tasked with calculating the total number of units sold for each product and categorizing ad performance based on the following criteria for items sold:

Outstanding: 30+

Satisfactory: 20 — 29

Unsatisfactory: 10 — 19

Poor: 1 — 9

Your output should contain the product ID, total units sold in descending order, and its categorized ad performance.

My dataset name is marketing_campaign and here are the data types.

marketing_campaign

user_id: int

created_at: datetime

product_id: int

quantity: int

price: int

Here is the expected output:

product_id total_sold ad_performance

105 41 Outstanding

102 29 Satisfactory

114 23 Satisfactory

118 22 Satisfactory

120 21 Satisfactory

Use mysql to solve this.

ChatGPT: To solve this task using MySQL, you can write a query to calculate the total number of units sold for each product and categorize the ad performance based on the given criteria.

Here's the query:

SELECT      product_id,      SUM(quantity) AS total_sold,      CASE          WHEN SUM(quantity) >= 30 THEN 'Outstanding'          WHEN SUM(quantity) BETWEEN 20 AND 29 THEN 'Satisfactory'          WHEN SUM(quantity) BETWEEN 10 AND 19 THEN 'Unsatisfactory'          WHEN SUM(quantity) BETWEEN 1 AND 9 THEN 'Poor'      END AS ad_performance  FROM      marketing_campaign  GROUP BY      product_id  ORDER BY      total_sold DESC;  

In this query, we select the product_id, calculate the sum of quantities as total_sold, and use a CASE statement to categorize the ad performance based on the total_sold value. The query groups the results by product_id, orders them in descending order of total_sold, and provides the desired output.

Make sure to replace "marketing_campaign" with the actual name of your table in the database.

Running this query will give you the expected output with the product_id, total_sold, and ad_performance categorized according to the criteria.

Control

The code is perfectly working without needing any updates.

Question 2

Here is the second SQL question taken from LeetCode.

XXXXX

Prompt : Table: Sales

+————-+——-+

| Column Name | Type |

+————-+——-+

| sale_id | int |

| product_id | int |

| year | int |

| quantity | int |

| price | int |

+————-+——-+

(sale_id, year) is the primary key of this table. product_id is a foreign key to Product table.

Each row of this table shows a sale on the product product_id in a certain year. Note that the price is per unit.

Table: Product

+—————+———+

| Column Name | Type |

+—————+———+

| product_id | int |

| product_name | varchar |

+—————+———+

product_id is the primary key of this table.

Each row of this table indicates the product name of each product.

Write an SQL query that selects the product id, year, quantity, and price for the first year of every product sold.

Return the resulting table in any order.

The query result format is in the following example.

Example 1:

Input:

Sales table:

+———+————+——+———-+——-+

| sale_id | product_id | year | quantity | price |

+———+————+——+———-+——-+

| 1 | 100 | 2008 | 10 | 5000 |

| 2 | 100 | 2009 | 12 | 5000 |

| 7 | 200 | 2011 | 15 | 9000 |

+———+————+——+———-+——-+

Product table:

+————+—————+

| product_id | product_name |

+————+—————+

| 100 | Nokia |

| 200 | Apple |

| 300 | Samsung |

+————+—————+

Output:

+————+————+———-+——-+

| product_id | first_year | quantity | price |

+————+————+———-+——-+

| 100 | 2008 | 10 | 5000 |

| 200 | 2011 | 15 | 9000 |

+————+————+———-+——-+

ChatGPT: To select the product id, year, quantity, and price for the first year of every product sold, you can use the following SQL query:

SELECT s.product_id, s.year AS first_year, s.quantity, s.price  FROM Sales s  JOIN (      SELECT product_id, MIN(year) AS first_year      FROM Sales      GROUP BY product_id  ) t ON s.product_id = t.product_id AND s.year = t.first_year  ORDER BY s.product_id;  

Explanation:

  1. The subquery (SELECT product_id, MIN(year) AS first_year FROM Sales GROUP BY product_id) selects the minimum year for each product using the MIN() function and groups the results by product_id.
  2. The main query joins the Sales table with the subquery based on the matching product_id and year columns to retrieve the sales records for the first year of each product.
  3. The selected columns, including product_id, year as first_year, quantity, and price, are returned from the Sales table for the matching records.
  4. The result is ordered by product_id in ascending order.

The query will return the product id, first year, quantity, and price for the first year of every product sold, as shown in the example output.

Control

The code is perfectly working without needing any updates.

R

R is a language purpose-built for statistical evaluations and data visualization. GPT-4 can support comprehending the subtleties of the R language, debugging the R code, and employing various R packages for data analysis.

Question

XXXXX

Here is the coding question.

Prompt:

Here is the question: For each guest reviewer, find the nationality of the reviewer’s favorite host based on the guest’s highest review score given to a host. Output the user ID of the guest along with their favorite host’s nationality. In case there is more than one favorite host from the same country, list that country only once (remove duplicates).

Both the from_user and to_user columns are user IDs.

Here is the expected output;

from_user nationality

0 Luxembourg

1 Brazil

2 Australia

3 Australia

4 Brazil

Here are the data frames;

airbnb_reviews

from_user:int64

to_user:int64

from_type: object

to_type: object

review_score:int64

airbnb_hosts

host_id: int64

nationality: object

gender: object

age:int64

Solve this with R.

Hints:

Select specific column/s from dataframe using [column_name] then select rows with values equal to ‘==’ guest

Use .groupby(column_name) to group the dataframe about the specified column and max() to get the maximum score given per group

Perform inner join on original df and grouped df to get the corresponding attributes/details of the grouped df by using pd.merge(dataframe1, dataframe2, on = common_table_keys)

Perform inner join on resulting df and airbnb_hosts

Use [ [ column_name/s] ] to select a specified column of the dataframe then use drop_duplicates(column_name) to get distinct values in the specified column.

ChatGPT: To solve this task using R, you can follow these steps:

  1. Select the necessary columns from the 'airbnb_reviews' dataframe and filter the rows where 'from_type' is 'guest'.
  2. Group the filtered dataframe by 'from_user' and get the maximum review score for each guest using the max() function.
  3. Perform an inner join between the original dataframe and the grouped dataframe based on the 'from_user' and 'review_score' columns to get the corresponding rows with the maximum review score for each guest.
  4. Perform another inner join between the resulting dataframe and the 'airbnb_hosts' dataframe based on the 'to_user' and 'host_id' columns to get the nationality of the favorite host.
  5. Select the 'from_user' and 'nationality' columns from the final joined dataframe.
  6. Use the distinct() function to remove duplicate rows based on the 'nationality' column.

Here's the code to achieve this:

# Select necessary columns from airbnb_reviews and filter rows where from_type is 'guest'  filtered_reviews <- airbnb_reviews[airbnb_reviews$from_type == 'guest', c('from_user', 'to_user', 'review_score')]    # Group by from_user and get the maximum review score for each guest  max_scores <- aggregate(review_score ~ from_user, filtered_reviews, max)    # Perform inner join to get the rows with the maximum review score for each guest  joined_df <- merge(filtered_reviews, max_scores, by = c('from_user', 'review_score'))    # Perform inner join with airbnb_hosts to get the nationality of the favorite host  final_df <- merge(joined_df, airbnb_hosts, by.x = 'to_user', by.y = 'host_id')    # Select the necessary columns from the final joined dataframe  result <- final_df[, c('from_user', 'nationality')]    # Remove duplicate rows based on nationality  distinct_result <- distinct(result, .keep_all = TRUE)    # Print the final result  distinct_result

Running this code will give you the expected output with the 'from_user' (guest ID) and the corresponding 'nationality' of their favorite host.

Control

The code is perfectly working without needing any updates.

Bonus: Doing Data Project With ChatGPT

The data project we’ll test ChatGPT on has been used as a take-home assignment in the recruitment process for the data science positions at Uber.

XXXXX

Overall, there are two different scenarios;

Scenario 1: It is going to be a huge Saturday and there will need to be many more cars on the road than last week. In order to get drivers to go online, we're assessing the following two bonus options in terms of cost:

  • Option 1: $50 for each driver that is online at least 8 hours, accepts 90% of requests, complete 10 trips, and has a rating of 4.7 or better during the time frame;
  • Option 2: $4/trip for all drivers who complete 12 trips, and have a 4.7 or better rating.

Scenario 2: A taxi driver currently generates $200 per day in fares (before expenses), works six days a week, takes three weeks off, and has the following expenses:

  • Gas — $200 per week
  • Insurance — $400 per month
  • Vehicle rent (by the week) — $500

The driver doesn't pay gas and rent expenses on off weeks.

ChatGPT File Uploader

Now, to do a data project, we need the ChatGPT File Uploader Chrome extension. It will allow you to upload files to ChatGPT.

You don’t need it if you already have a Code Interpreter plug-in – simply upload a file using this plug-in.

For more information about it, please watch this video.

After we set the File Uploader extension, let’s try and solve this data project.

Open ChatGPT and click the button of this extension.

You will see the submit button as shown below.

XXXXX
Source: https://chat.openai.com/

Now, let’s feed ChatGPT with the dataset first by using our dataset file and clicking the Submit File button first.

XXXXX
Source: https://chat.openai.com/

After loading the dataset, ChatGPT will prompt the dataset.

XXXXX
Source: https://chat.openai.com/

Now, here is the prompt you should use to solve this data project.

XXXXX

Prompt: Given the operational scenarios and the provided dataset, answer the following questions by using Python

Assignment

Scenario 1: It is going to be a huge Saturday and there will need to be many more cars on the road than last week. In order to get drivers to go online, we're assessing the following two bonus options in terms of cost:

Option 1: $50 for each driver that is online at least 8 hours, accepts 90% of requests, completes 10 trips,, and has a rating of 4.7 or better during the time frame;

Option 2: $4/trip for all drivers who complete 12 trips, and have a 4.7 or better rating.

Using the dataset provided and given Scenario 1, provide answers to the questions below:

How much would the total bonus payout be with Option 1?

How much would the total bonus payout be with Option 2?

How many drivers would qualify for a bonus under Option 1 but not under Option 2?

What percentages of drivers online completed less than 10 trips, had an acceptance rate of less than 90%, and had a rating of 4.7 or higher?

Scenario 2: A taxi driver currently generates $200 per day in fares (before expenses), works six days a week, takes three weeks off, and has the following expenses:

Gas — $200 per week

Insurance — $400 per month

Vehicle rent (by the week) — $500

The driver doesn't pay gas and rent expenses on off weeks.

Now, let's assume that the same driver would buy a Town Car and partner with Uber. If he does, his gas expenses would go up by 5%, his insurance expense would decrease by 20%, and he would no longer be renting a vehicle. However, he would need to buy a car. The driver would still take three weeks off per year.

Given Scenario 2, provide answers to the questions below:

How much money (after expenses) does the taxi driver make per year without partnering with Uber?

You are convincing the same driver above to buy a Town Car and partner with Uber. Assuming the new car is 40,000 USD, how much would the driver's gross fares need to increase per week to fully pay for the car in year 1 and maintain the same yearly profit margin as before?

ChatGPT:

To perform these computations, we first need to load and preprocess the dataset. We'll use Python's pandas library to load the CSV file and parse the required columns.

Unfortunately, as a text-based model, I am unable to directly execute code. But, I can provide you with the Python code you can use to answer your questions:

Here's the code you would need to answer your questions for Scenario 1:

import pandas as pd    # Load dataset  data = pd.read_csv('dataset_2.csv')    # Convert 'Accept Rate' from string percentage to float  data['Accept Rate'] = data['Accept Rate'].str.rstrip('%').astype('float') / 100.0    # Calculate bonus under both options  data['Option 1 Bonus'] = ((data['Supply Hours'] >= 8) & (data['Accept Rate'] >= 0.9) &                             (data['Trips Completed'] >= 10) & (data['Rating'] >= 4.7)) * 50  data['Option 2 Bonus'] = ((data['Trips Completed'] >= 12) & (data['Rating'] >= 4.7)) * data['Trips Completed'] * 4    # Compute total payout for each option  total_payout_option1 = data['Option 1 Bonus'].sum()  total_payout_option2 = data['Option 2 Bonus'].sum()    # Compute number of drivers who qualify under Option 1 but not Option 2  num_option1_only = ((data['Option 1 Bonus'] > 0) & (data['Option 2 Bonus'] == 0)).sum()    # Compute percentage of drivers who complete less than 10 trips, had an acceptance rate of less than 90% and had a rating of 4.7 or higher  percent_undesirable_drivers = ((data['Trips Completed'] < 10) & (data['Accept Rate'] < 0.9) & (data['Rating'] >= 4.7)).mean() * 100    print('Total bonus payout for Option 1: $', total_payout_option1)  print('Total bonus payout for Option 2: $', total_payout_option2)  print('Number of drivers who qualify under Option 1 but not Option 2: ', num_option1_only)  print('Percentage of drivers who complete less than 10 trips, had an acceptance rate of less than 90%, and had a rating of 4.7 or higher: ', percent_undesirable_drivers)  

Here's the code to answer your questions for Scenario 2:

# Calculate the taxi driver's yearly income and expenses  daily_income = 200  weekly_income = daily_income * 6  # Working 6 days a week  yearly_income = weekly_income * 52  # Working all year  yearly_expenses = (200 * 49) + (400 * 12) + (500 * 49)  # No expenses on off weeks  yearly_net = yearly_income - yearly_expenses    # Calculate the new expenses and net income if the driver partners with Uber  new_yearly_expenses = (200 * 1.05 * 49) + (400 * 0.8 * 12)  # 5% increase in gas, 20% decrease in insurance  new_yearly_net = yearly_income - new_yearly_expenses - 40000  # Subtract car cost    # Calculate the increase in gross fares necessary to maintain the same profit margin  additional_income_required = yearly_net - new_yearly_net  additional_income_required_weekly = additional_income_required / 52    print('Taxi driver yearly net income: $', yearly_net)  print('Required increase in weekly gross fares: $', additional_income_required_weekly)

You can replace the file path in the read_csv function with the actual path to your dataset_2.csv file, and then run these code snippets in your local Python environment.

Control

First Scenario’s output is totally correct (the first 4 questions) However, the second scenario’s output is not correct.

For example, it calculates that the driver works 52 weeks (all year); however, the driver takes 3 weeks off.

That’s why you need to be careful about the calculations that ChatGPT makes and check the steps it follows to be sure about the result.

After a few tries, here is the code that outputs the correct result that ChatGPT provides.

# Weeks off per year  weeks_off = 3    # Total weeks per year  total_weeks_per_year = 52    # Weeks of work per year  working_weeks = total_weeks_per_year - weeks_off    # Total months per year  total_months_per_year = 12    # Expenses  gas_per_week = 200  insurance_per_month = 400  vehicle_rent_per_week = 500    # Revenue  fare_per_day = 200   work_days_per_week = 6  total_revenue = fare_per_day * work_days_per_week * working_weeks    # Total Expenses  total_expenses = (gas_per_week + vehicle_rent_per_week) * working_weeks + insurance_per_month * total_months_per_year    # Net Income  net_income = total_revenue - total_expenses    print("Net Income per year: $", net_income)    # For Question 6:    # Expenses changes  new_gas_per_week = gas_per_week * 1.05  # 5% increase  new_insurance_per_month = insurance_per_month * 0.8  # 20% decrease  car_cost = 40000    # New Total Expenses  new_total_expenses = car_cost + (new_gas_per_week * working_weeks) + (new_insurance_per_month * total_months_per_year)    # To maintain the same net income, the new revenue should be:  new_total_revenue = new_total_expenses + net_income    # This translates to a required increase in weekly fares:  additional_fare_per_week = (new_total_revenue - total_revenue) / working_weeks    print("Required increase in weekly fares: $", additional_fare_per_week)

Final Words

This exploration has shed light on the innovative applications and inherent limitations of GPT-4, particularly in the realm of data science.

Whether tackling complex statistical and probability problems, modeling issues, or fielding coding queries, GPT-4 has shown promising capabilities.

However, it's essential to remember that while GPT-4 provides a valuable tool, human oversight and understanding remain critical to ensure quality outcomes, especially in mathematical calculations.

This exploration is only the beginning of the GPT-4 journey, as further advancements and refinements are always on the horizon. Sam Altman, the OpenAI CEO, said that in the future model (GPT5), OpenAI plans to solve the problem with inaccurate calculations and the cap on the number of prompts.

Continue to stay curious, keep exploring, and remember, the only limit to AI's potential is the one we set for it.

Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Connect with him on Twitter: StrataScratch or LinkedIn.

More On This Topic

  • Forecasting Future Events: The Capabilities and Limitations of AI and ML
  • Exploring the SwAV Method
  • Exploring Unsupervised Learning Metrics
  • Exploring Data Distributions with Histograms
  • Database Optimization: Exploring Indexes in SQL
  • Exploring Data Cleaning Techniques With Python

WormGPT: What to know about ChatGPT’s malicious cousin

wormgpt

It was only a matter of time before the AI chatbot was emulated for malicious purposes — and one such tool is now on the market, known as WormGPT.

When ChatGPT was made available to the public on November 30, 2002, the AI chatbot took the world by storm.

The software was developed by OpenAI, an AI and research company. ChatGPT is a natural language processing tool able to answer queries and provide information based on datasets gleaned from datasets, including books and online web pages, and has since become a valued tool for on-the-fly information gathering, analysis, and writing tasks for millions of users worldwide.

Also: The 5 best VPN services (and tips to choose the right one for you)

While some experts believe the technology could prove to reach an internet level of disruption, others note that ChatGPT demonstrates 'confident inaccuracy.' Students in droves have been caught plagiarising coursework by way of the tool, and unless datasets are verified, tools such as ChatGPT could become unwitting tools to spread misinformation and propaganda.

Indeed, the US Federal Trade Commission (FTC) is investigating Open AI over its handling of personal information and the data used to create its language model.

Beyond data protection concerns, however, whenever a new technological innovation is made, so are pathways for abuse. It was only a matter of time before the AI chatbot was emulated for malicious purposes — and one such tool is now on the market, known as WormGPT.

There is a subscription option, ChatGPT Plus, which users can sign up for. The subscription costs $20 per month and provides users with access to ChatGPT during peak times and otherwise, faster response times, and priority access to improvements and fixes.

Also: How to access, install, and use AI ChatGPT-4 plugins (and why you should)

Security