New ‘BeFake’ social media app encourages users to transform their photos with AI

BeFake AI

Although most social media sites claim to be platforms where people can share their most authentic selves with the world, the profiles display carefully curated windows into what users want others to see. BeFake AI, a new free app, leans into the fakeness of social media.

BeFake AI claims to be the "First AI-Augmented Social Network" where users can use AI to modify their images into art creations and upload them online.

Also: The best AI image generators right now: DALL-E 2 and alternatives

The app's name and design are a play on BeReal, the social media platform released in January 2020 that attempts to bring authenticity back to social media apps with impromptu notifications where users have to share photos without filters within two minutes.

The app listing even slyly refers to BeReal saying, "Why be real when it's fun to BeFake?"

Like BeReal, BeFake users will get random notifications throughout the day to upload a front and back camera picture. However, unlike BeReal, users don't upload raw, unedited photos. Instead, they use AI to edit and post the photos.

Also: How to get a perfect face swap using Midjourney AI

Because the editing process takes longer than snapping a photo and uploading, users have a 20-minute window to upload their AI photos on BeFake.

To edit the photo, you can use the predetermined prompts or personalize your own, similar to how you would on DALL-E or other AI art generators.

Once you are satisfied with your photo, you can upload your edited picture. However, there is a catch. Despite posting a picture that doesn't resemble the original photo, users can swipe to see the original photo and the AI-transformed one.

However, if you would rather stay anonymous, there is a subscription you can join for $9.99 per month, $99.99 per year, or $2.99 per week, which gives you perks such as hiding the original photo, posting late without penalty, and having unlimited photo generations.

Also: How AI can turn any photo into a professional headshot

The app is available for download in the Google Play Store and Apple App Store. Before you download the app, be aware that (like any other AI model) it can use your personal data for its own use, such as training its models further.

The app's privacy policy says it can use your personal data "to provide and maintain our Service" as well as "for business transfers" or "other purposes," which include "data analysis, identifying usage trends, determining the effectiveness of our promotional campaigns and to evaluate and improve our Service, products, services, marketing and your experience."

Artificial Intelligence

In a win for humans, federal judge rules that AI-generated artwork can’t be copyrighted

Computer generated image

(Not an AI-generated image.)

In the battle against artificial intelligence in the creative sphere, humans have picked up a win. A federal judge ruled last week that works generated by AI cannot be copyrighted

The ruling, delivered from U.S. District Judge Beryl Howell, said that copyright law has never "stretched so far" as to "protect works generated by new forms of technology operating absent any guiding human hand. Human authorship is a bedrock requirement."

Also: The best AI image generators right now: DALL-E 2 and alternatives

The notion of AI generated works not receiving legal protection is good news for people working in creative fields — especially as we're more than 100 days into the Hollywood SAG-AFTRA strike and the idea of using artificial intelligence to create scripts is gaining steam.

U.S. copyright law was designed to adapt with the times, the ruling added, but there has been a consistent belief that human involvement is "at the core of copyrightability, even as that human creativity is channeled through new tools or into new media," the ruling stated.

This mindset predates AI though, as nearly a decade ago, a U.S. district court made a similar ruling in the case of a selfie taken by a monkey. That selfie, the court ruled, couldn't be copyrighted because the image wasn't taken by a person.

Also: How to use Midjourney to generate amazing images and art

Judge Howell noted that in the instance of a camera, for example, the camera does technically generate the image, but only after the human first conceives the image, sets the scene and lighting, adjusts camera parameters, and more. AI does have human involvement in say, creating a prompt, but the actual work is all computer-generated.

This recent ruling comes as the result of a lawsuit by computer scientist Stephen Thaler, who argued that an image created by AI software he created should be allowed to be copyrighted.

Also: How to use Photoshop's Generative Fill AI tool to easily transform your boring photos

The U.S. Copyright office turned down his application for protection, saying that "the nexus between the human mind and creative expression" was a necessary part of getting a copyright. Thaler, in turn, filed a lawsuit challenging that ruling.

And while this is a strike against AI in art, it's worth noting that the copyright office has ruled that "AI-assisted" art can be copyrighted if a human "selected or arranged it in a sufficiently creative way."

Snapchat is expanding further into generative AI with ‘Dreams’

Snapchat is expanding further into generative AI with ‘Dreams’ Sarah Perez @sarahintampa / 9 hours

Snapchat is preparing to further expand into generative AI features, after earlier launching its AI-powered chatbot My AI which can now respond with a Snap back, not just text. With the company’s forthcoming generative AI feature called “Dreams,” Snap will again experiment with AI images — but soon, those images may contain you and your friends in imaginative backgrounds.

The company has been developing features that allow Snapchat users to take or upload selfie photos that will allow the app to generate new pictures of you in scenarios you imagine, according to findings from app researcher and developer Steve Moser. This sounds similar to what other AI photo apps on the App Store already offer.

One in particular — an app called Remini — went viral last month as TikTok users realized they could upload their selfies in order to receive professional-looking headshots for LinkedIn without having to pay for a pro photo shoot.

Snapchat is not likely interested in boring headshots, though.

Instead, it imagines Dreams as a way to use AI-generated selfies to place pictures of you in “fantastical places and scenarios,” Moser’s research indicates. Like other AI selfie apps, Snapchat would need clear selfies to work with — not ones where your features are obstructed or those with other people in them. Having a variety of angles, expressions, and lightning conditions will also result in better AI photos, the app will instruct users.

In addition to putting yourself into these AI “Dreams,” the company is developing Dreams with Friends — a feature where users give their friends permission to generate these AI “dream” images with the two of them included, Moser discovered.

References to purchasing Dream Packs found in Snapchat’s app also suggests this may be a monetizable feature at some point.

Dreams was first spotted earlier this spring, when reverse engineer Alessandro Paluzzi revealed the feature would allow users to place their own likeness into realms powered by generative AI. The new feature was being given a prominent placement in Snapchat’s app, right in between the Camera Roll and Stories, he found.

The new developments around Dreams with Friends and the Dream Packs suggest Snapchat is now moving forward with the feature.

Snapchat declined to comment on its plans for Dreams.

Nvidia H100: Are 550,000 GPUs Enough for This Year?

Nvidia H100: Are 550,000 GPUs Enough for This Year? August 21, 2023 by Doug Eadline

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its latest H100 GPUs worldwide in 2023. The appetite for GPUs is obviously coming from the generative AI boom, but the HPC market is also competing for these accelerators. It is not clear if this number includes the throttled China-specific A800 and H800 models.

The bulk of the GPUs will be going to US technology companies, but the Financial Times notes that Saudi Arabia has purchased at least 3,000 Nvidia H100 GPUs and the UAE has also purchased thousands of Nvidia chips. UAE has already developed its own open-source large language model using 384 A100 GPUs, called Falcon, at the state-owned Technology Innovation Institute in Masdar City, Abu Dhabi.

The flagship H100 GPU (14,592 CUDA cores, 80GB of HBM3 capacity, 5,120-bit memory bus) is priced at a massive $30,000 (average), which Nvidia CEO Jensen Huang calls the first chip designed for generative AI. The Saudi university is building its own GPU-based supercomputer called Shaheen III. It employs 700 Grace Hopper chips that combine a Grace CPU and an H100 Tensor Core GPU. Interestingly, the GPUs are being used to create an LLM developed by Chinese researchers who can’t study or work in the US.

Meanwhile, generative AI (GAI) investments continue to fund GPU infrastructure purchases. As reported, in the first 6 months of 2023, funding to GAI start-ups is up more than 5x compared to full-year 2022 and the generative AI infrastructure category has seen over 70% of the funding since Q3’22.

Worth the Wait

The cost of a H100 varies depending on how it is packaged and presumably how many you are able to purchase. The current (Aug-2023) retail price for an H100 PCIe card is around $30,000 (lead times can vary as well.) A back-of-the-envelope estimate gives a market spending of $16.5 billion for 2023 — a big chunk of which will be going to Nvidia. According to estimates made by Barron’s senior writer Tae Kim in a recent social media post estimates it costs Nvidia $3,320 to make a H100. That is a 1000% percent profit based on the retail cost of an Nvidia H100 card.

The Nvidia H100 PCIe GPU.

As often reported, Nvidia’s partner TSMC can barely meet the demand for GPUs. The GPUs require a more complex CoWoS manufacturing process (Chip on Wafer on Substrate — a “2.5D” packaging technology from TSMC where multiple active silicon dies, usually GPUs and HBM stacks, are integrated on a passive silicon interposer.) Using CoWoS adds a complex multi-step, high-precision engineering process that slows down the rate of GPU production.

This situation was confirmed by Charlie Boyle, VP and GM of Nvidia’s DGX systems. Boyle states that delays are not from miscalculating demand or wafer yield issues from TSMC, but instead from the chip packaging CoWoS technology.

HPC in the Great GPU Squeeze

As evidenced by reduced availability, huge purchase quantities, and rising prices, the great GPU Squeeze has begun to affect the HPC market. An important industry expert discussion on this topic will be part of the September 26-27, 2023, HPC on Wall Street event.

This long-standing east coast event features HPC maven Jay Boisseau as emcee. Jay is the Former Associate Director for Scientific Computing of the San Diego Supercomputing Center, the Founding Director of the Texas Advanced Computing Center (TACC, the fastest academic supercomputing center in the US), and former HPC & AI Technology strategist for Dell.

Please join Jay and many global financial luminaries, HPC experts, firms, and leading technology companies investing in FinTech solutions. The event will have four important sessions (including an HPC Squeeze panel). The sessions are expected to include:

  • From Open Source to Third Party: Capturing the Early Potential of Gen AI for FinServ
  • Quantum Computing Analyst Panel – One Year Later
  • HPC in the Great GPU Squeeze
  • The Data Management Challenges with Generative AI

Register now for HPC + AI Wall Street on September 26-27 at the InterContinental Times Square, NYC.

Editors Note: Tabor Communications, publishers of HPCwire and EnterpriseAI, also produces the HPC + AI Wall Street event.

This article first appeared on sister site HPCwire.

Related

YouTube is working on a plan to compensate artists and rightsholders for AI music

YouTube is working on a plan to compensate artists and rightsholders for AI music Sarah Perez @sarahintampa / 9 hours

YouTube announced today how it plans to approach the impact AI technology is having on the music industry with regard to its video hosting platform and its existing partnerships across the music industry, including with artists, labels and other rightsholders. While the company is bullish on AI’s potential to “enhance music’s unique creative expression,” it also says it needs to ensure the integrity of artists’ work is protected.

To that end, the company is launching something it’s calling YouTube’s Music AI Incubator, to help inform its approach to AI by working with artists, songwriters, and producers across the industry to make decisions about how to proceed.

To kick off the program, YouTube is working with Universal Music Group (UMG) and its roster of talent, including songwriter and producer Anitta; songwriter, producer and entrepreneur Björn Ulvaeus; musician, composer and producer Don Was; Columbian musician Juanes; producer Louis Bell; composer Max Richter, songwriter and producer Rodney Jerkins; singer-songwriter Rosanne Cash; songwriter and producer, Ryan Tedder of OneRepublic; rapper, musician, entrepreneur, and philanthropist Yo Gotti; and the estate of Frank Sinatra.

Unlike YouTube, UMG has been more hesitant to embrace AI. Earlier this year, it asked streaming services like Spotify to prevent AI companies from using its music to train their models, for example. It also issued copyright strikes on AI-generated YouTube videos that leveraged its artists’ work. When a viral AI song that replicated Drake and The Weeknd’s vocals went viral, UMG had the song pulled from Spotify and Apple Music.

At the heart of UMG’s complaints — similar to those we’re seeing across creative industries — is the problem of having artists’ work ingested to train AI models and then re-used to create new art without proper permission or compensation. It’s no surprise then, that UMG has teamed up with YouTube to develop some sort of structure that ensures rightsholders get paid.

YouTube alludes to its historical understanding of this tension between new technologies and compensation, noting that, over the years, it’s “made massive investments over the years in the systems that help balance the interests of copyright holders with those of the creative community on YouTube.”

It references, for example, its Content ID system, which ensures rightsholders are paid for the use of their content on the platform. YouTube suggests that some similar system may work for AI music — at least for those “music partners who decide to participate,” it says.

The company also notes that trust and safety are key to making this system work, adding that it already has policies around technically manipulated content designed to mislead viewers. Similarly, it aims to scale those systems to ensure that generative AI isn’t used for things like copyright abuse, misinformation, and spam, as well. Instead, it plans to use AI technologies to identify this sort of content.

YouTube says it will share more details about how its new system for AI music will work in terms of the specific technologies, monetization opportunities, and policies being developed in the future.

“I’m incredibly excited about the opportunity of AI to supercharge creativity around the world, but recognize that YouTube and the promise of AI will only be successful if our partners are successful,” wrote YouTube CEO Neal Mohan, in an announcement. “Together, we can embrace this new technology in a way that supports artists, songwriters, producers, and the industry as a whole while driving value for fans and pushing the bounds of what’s creatively possible,” he said.

Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries

Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries
Image by OrMaVaredo on Pixabay

Python is one of the most used programming languages in the world and provides developers with a wide range of libraries.

Anyway, when it comes to data manipulation and scientific computation, we generally think of libraries such as Numpy, Pandas, or SciPy.

In this article, we introduce 3 Python libraries you may be interested in.

1. Dask

Introducing Dask

Dask is a flexible parallel computing library that enables distributed computing and parallelism for large-scale data processing.

So, why should we use Dask? As they say on their website:

Python has grown to become the dominant language both in data analytics and general programming. This growth has been fueled by computational libraries like NumPy, pandas, and scikit-learn. However, these packages weren’t designed to scale beyond a single machine. Dask was developed to natively scale these packages and the surrounding ecosystem to multi-core machines and distributed clusters when datasets exceed memory.

So, one of the common uses of Dask, as they say, is:

Dask DataFrame is used in situations where pandas is commonly needed, usually when pandas fails due to data size or computation speed:
— Manipulating large datasets, even when those datasets don’t fit in memory
— Accelerating long computations by using many cores
— Distributed computing on large datasets with standard pandas operations like groupby, join, and time series computations

So, Dask is a good choice when we need to deal with huge Pandas data frames. This is because Dask:

Allows users to manipulate 100GB+ datasets on a laptop or 1TB+ datasets on a workstation

Which is a pretty impressive result.

What happens under the hood, is that:

Dask DataFrames coordinate many pandas DataFrames/Series arranged along the index. A Dask DataFrame is partitioned row-wise, grouping rows by index value for efficiency. These pandas objects may live on disk or on other machines.

So, we have something like that:

Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries
The difference between a Dask and a Pandas data frame. Image by Author, freely inspired by one on the Dask website already quoted.

Some features of Dask in action

First of all, we need to install Dask. We can do it via pip or conda like so:

$ pip install dask[complete]    or    $ conda install dask

FEATURE ONE: OPENING A CSV FILE

The first feature we can show of Dask is how we can open a CSV. We can do it like so:

import dask.dataframe as dd    # Load a large CSV file using Dask  df_dask = dd.read_csv('my_very_large_dataset.csv')    # Perform operations on the Dask DataFrame  mean_value_dask = df_dask['column_name'].mean().compute()

So, as we can see in the code, the way we use Dask is very similar to Pandas. In particular:

  • We use the method read_csv() exactly as in Pandas
  • We intercept a column exactly as in Pandas. In fact, if we had a Pandas data frame called df we’d intercept a column this way: df['column_name'].
  • We apply the mean() method to the intercepted column similar to Pandas, but here we also need to add the method compute().

Also, even if the methodology of opening a CSV file it’s the same as in Pandas, under the hood Dask is effortlessly processing a large dataset that exceeds the memory capacity of a single machine.

This means that we can’t see any actual difference, except the fact that a large data frame can’t be opened in Pandas, but in Dask we can.

FEATURE TWO: SCALING MACHINE LEARNING WORKFLOWS

We can use Dask to also create a classification dataset with a huge number of samples. We can then split it into the train and the test sets, fit the train set with an ML model, and calculate predictions for the test set.

We can do it like so:

import dask_ml.datasets as dask_datasets  from dask_ml.linear_model import LogisticRegression  from dask_ml.model_selection import train_test_split    # Load a classification dataset using Dask  X, y = dask_datasets.make_classification(n_samples=100000, chunks=1000)    # Split the data into train and test sets  X_train, X_test, y_train, y_test = train_test_split(X, y)    # Train a logistic regression model in parallel  model = LogisticRegression()  model.fit(X_train, y_train)    # Predict on the test set  y_pred = model.predict(X_test).compute()

This example stresses the ability of Dask to handle huge datasets even in the case of a Machine Learning problem, by distributing computations across multiple cores.

In particular, we can create a “Dask dataset” for a classification case with the method dask_datasets.make_classification(), and we can specify the number of samples and chunks (even, very huge!).

Similarly as before, the predictions are obtained with the method compute().

NOTE:    in this case, you may need to intsall the module dask_ml.    You can do it like so:    $ pip install dask_ml

FEATURE THREE: EFFICIENT IMAGE PROCESSING

The power of parallel processing that Dask utilizes can also be applied to images.

In particular, we could open multiple images, resize them, and save them resized. We can do it like so:

import dask.array as da  import dask_image.imread  from PIL import Image    # Load a collection of images using Dask  images = dask_image.imread.imread('image*.jpg')    # Resize the images in parallel  resized_images = da.stack([da.resize(image, (300, 300)) for image in images])    # Compute the result  result = resized_images.compute()    # Save the resized images  for i, image in enumerate(result):      resized_image = Image.fromarray(image)      resized_image.save(f'resized_image_{i}.jpg')

So, here’s the process:

  1. We open all the “.jpg” images in the current folder (or in a folder that you can specify) with the method dask_image.imread.imread("image*.jpg").
  2. We resize them all at 300×300 using a list comprehension in the method da.stack().
  3. We compute the result with the method compute(), as we did before.
  4. We save all the resized images with the for cycle.

2. SymPy

Introducing Sympy

If you need to make mathematical calculations and computations and want to stick to Python, you can try Sympy.

Indeed: why use other tools and software, when we can use our beloved Python?

As per what they write on their website, Sympy is:

A Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.

But why use SymPy? They suggest:

SymPy is…

— Free: Licensed under BSD, SymPy is free both as in speech and as in beer.

— Python-based: SymPy is written entirely in Python and uses Python for its language.

— Lightweight: SymPy only depends on mpmath, a pure Python library for arbitrary floating point arithmetic, making it easy to use.

— A library: Beyond use as an interactive tool, SymPy can be embedded in other applications and extended with custom functions.

So, it basically has all the characteristics that can be loved by Python addicts!

Now, let’s see some of its features.

Some features of SymPy in action

First of all, we need to install it:

$ pip install sympy
PAY ATTENTION:    if you write $ pip install simpy you'll install another (completely    different!) library.    So, the second letter is a "y", not an "i".

FEATURE ONE: SOLVING AN ALGEBRAIC EQUATION

If we need to solve an algebraic equation, we can use SymPy like so:

from sympy import symbols, Eq, solve    # Define the symbols  x, y = symbols('x y')    # Define the equation  equation = Eq(x**2 + y**2, 25)    # Solve the equation  solutions = solve(equation, (x, y))    # Print solution  print(solutions)      >>>      [(-sqrt(25 - y**2), y), (sqrt(25 - y**2), y)]

So, that’s the process:

  1. We define the symbols of the equation with the method symbols().
  2. We write the algebraic equation with the method Eq.
  3. We solve the equation with the method solve().

When I was at the University I used different tools to solve these kinds of problems, and I have to say that SymPy, as we can see, is very readable and user-friendly.

But, indeed: it’s a Python library, so how could that be any different?

FEATURE TWO: CALCULATING DERIVATIVES

Calculating derivatives is another task we may mathematically need, for a lot of reasons when analyzing data. Often, we may need calculations for any reason, and SympY really simplifies this process. In fact, we can do it like so:

from sympy import symbols, diff    # Define the symbol  x = symbols('x')    # Define the function  f = x**3 + 2*x**2 + 3*x + 4    # Calculate the derivative  derivative = diff(f, x)    # Print derivative  print(derivative)    >>>    3*x**2 + 4*x + 3

So, as we can see, the process is very simple and self-explainable:

  1. We define the symbol of the function we’re deriving with symbols().
  2. We define the function.
  3. We calculate the derivative with diff() specifying the function and the symbol we’re calculating the derivative (this is an absolute derivative, but we could perform even partial derivatives in the case of functions that have x and y variables).

And if we test it, we’ll see that the result arrives in a matter of 2 or 3 seconds. So, it’s also pretty fast.

FEATURE THREE: CALCULATING INTEGRATIONS

Of course, if SymPy can calculate derivatives, it can also calculate integrations. Let’s do it:

from sympy import symbols, integrate, sin    # Define the symbol  x = symbols('x')    # Perform symbolic integration  integral = integrate(sin(x), x)    # Print integral  print(integral)    >>>    -cos(x)

So, here we use the method integrate(), specifying the function to integrate and the variable of integration.

Couldn’t it be easier?!

3. Xarray

Introducing Xarray

Xarray is a Python library that extends the features and functionalities of NumPy, giving us the possibility to work with labeled arrays and datasets.

As they say on their website, in fact:

Xarray makes working with labeled multi-dimensional arrays in Python simple, efficient, and fun!

And also:

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.

In other words, it extends the functionality of NumPy arrays by adding labels or coordinates to the array dimensions. These labels provide metadata and enable more advanced analysis and manipulation of multi-dimensional data.

For example, in NumPy, arrays are accessed using integer-based indexing.

In Xarray, instead, each dimension can have a label associated with it, making it easier to understand and manipulate the data based on meaningful names.

For example, instead of accessing data with arr[0, 1, 2], we can use arr.sel(x=0, y=1, z=2) in Xarray, where x, y, and z are dimension labels.

This makes the code much more readable!

So, let’s see some features of Xarray.

Some features of Xarray in action

As usual, to install it:

$ pip install xarray

FEATURE ONE: WORKING WITH LABELED COORDINATES

Suppose we want to create some data related to temperature and we want to label these with coordinates like latitude and longitude. We can do it like so:

import xarray as xr  import numpy as np    # Create temperature data  temperature = np.random.rand(100, 100) * 20 + 10    # Create coordinate arrays for latitude and longitude  latitudes = np.linspace(-90, 90, 100)  longitudes = np.linspace(-180, 180, 100)    # Create an Xarray data array with labeled coordinates  da = xr.DataArray(      temperature,      dims=['latitude', 'longitude'],      coords={'latitude': latitudes, 'longitude': longitudes}  )    # Access data using labeled coordinates  subset = da.sel(latitude=slice(-45, 45), longitude=slice(-90, 0))

And if we print them we get:

# Print data  print(subset)    >>>    array([[13.45064786, 29.15218061, 14.77363206, ..., 12.00262833,          16.42712411, 15.61353963],         [23.47498117, 20.25554247, 14.44056286, ..., 19.04096482,          15.60398491, 24.69535367],         [25.48971105, 20.64944534, 21.2263141 , ..., 25.80933737,          16.72629302, 29.48307134],         ...,         [10.19615833, 17.106716  , 10.79594252, ..., 29.6897709 ,          20.68549602, 29.4015482 ],         [26.54253304, 14.21939699, 11.085207  , ..., 15.56702191,          19.64285595, 18.03809074],         [26.50676351, 15.21217526, 23.63645069, ..., 17.22512125,          13.96942377, 13.93766583]])  Coordinates:    * latitude   (latitude) float64 -44.55 -42.73 -40.91 ... 40.91 42.73 44.55    * longitude  (longitude) float64 -89.09 -85.45 -81.82 ... -9.091 -5.455 -1.818

So, let’s see the process step-by-step:

  1. We’ve created the temperature values as a NumPy array.
  2. We’ve defined the latitudes and longitueas values as NumPy arrays.
  3. We’ve stored all the data in an Xarray array with the method DataArray().
  4. We’ve selected a subset of the latitudes and longitudes with the method sel() that selects the values we want for our subset.

The result is also easily readable, so labeling is really helpful in a lot of cases.

FEATURE TWO: HANDLING MISSING DATA

Suppose we’re collecting data related to temperatures during the year. We want to know if we have some null values in our array. Here's how we can do so:

import xarray as xr  import numpy as np  import pandas as pd    # Create temperature data with missing values  temperature = np.random.rand(365, 50, 50) * 20 + 10  temperature[0:10, :, :] = np.nan  # Set the first 10 days as missing values    # Create time, latitude, and longitude coordinate arrays  times = pd.date_range('2023-01-01', periods=365, freq='D')  latitudes = np.linspace(-90, 90, 50)  longitudes = np.linspace(-180, 180, 50)    # Create an Xarray data array with missing values  da = xr.DataArray(      temperature,      dims=['time', 'latitude', 'longitude'],      coords={'time': times, 'latitude': latitudes, 'longitude': longitudes}  )    # Count the number of missing values along the time dimension  missing_count = da.isnull().sum(dim='time')    # Print missing values  print(missing_count)    >>>      array([[10, 10, 10, ..., 10, 10, 10],         [10, 10, 10, ..., 10, 10, 10],         [10, 10, 10, ..., 10, 10, 10],         ...,         [10, 10, 10, ..., 10, 10, 10],         [10, 10, 10, ..., 10, 10, 10],         [10, 10, 10, ..., 10, 10, 10]])  Coordinates:    * latitude   (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0    * longitude  (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0

And so we obtain that we have 10 null values.

Also, if we take a look closely at the code, we can see that we can apply Pandas’ methods to an Xarray like isnull.sum(), as in this case, that counts the total number of missing values.

FEATURE ONE: HANDLING AND ANALYZING MULTI-DIMENSIONAL DATA

The temptation to handle and analyze multi-dimensional data is high when we have the possibility to label our arrays. So, why not try it?

For example, suppose we’re still collecting data related to temperatures at certain latitudes and longitudes.

We may want to calculate the mean, the max, and the median temperatures. We can do it like so:

import xarray as xr  import numpy as np  import pandas as pd    # Create synthetic temperature data  temperature = np.random.rand(365, 50, 50) * 20 + 10    # Create time, latitude, and longitude coordinate arrays  times = pd.date_range('2023-01-01', periods=365, freq='D')  latitudes = np.linspace(-90, 90, 50)  longitudes = np.linspace(-180, 180, 50)    # Create an Xarray dataset  ds = xr.Dataset(      {          'temperature': (['time', 'latitude', 'longitude'], temperature),      },      coords={          'time': times,          'latitude': latitudes,          'longitude': longitudes,      }  )    # Perform statistical analysis on the temperature data  mean_temperature = ds['temperature'].mean(dim='time')  max_temperature = ds['temperature'].max(dim='time')  min_temperature = ds['temperature'].min(dim='time')    # Print values   print(f"mean temperature:n {mean_temperature}n")  print(f"max temperature:n {max_temperature}n")  print(f"min temperature:n {min_temperature}n")      >>>    mean temperature:     array([[19.99931701, 20.36395016, 20.04110699, ..., 19.98811842,          20.08895803, 19.86064693],         [19.84016491, 19.87077812, 20.27445405, ..., 19.8071972 ,          19.62665953, 19.58231185],         [19.63911165, 19.62051976, 19.61247548, ..., 19.85043831,          20.13086891, 19.80267099],         ...,         [20.18590514, 20.05931149, 20.17133483, ..., 20.52858247,          19.83882433, 20.66808513],         [19.56455575, 19.90091128, 20.32566232, ..., 19.88689221,          19.78811145, 19.91205212],         [19.82268297, 20.14242279, 19.60842148, ..., 19.68290006,          20.00327294, 19.68955107]])  Coordinates:    * latitude   (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0    * longitude  (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0    max temperature:     array([[29.98465531, 29.97609171, 29.96821276, ..., 29.86639343,          29.95069558, 29.98807808],         [29.91802049, 29.92870312, 29.87625447, ..., 29.92519055,          29.9964299 , 29.99792388],         [29.96647016, 29.7934891 , 29.89731136, ..., 29.99174546,          29.97267052, 29.96058079],         ...,         [29.91699117, 29.98920555, 29.83798369, ..., 29.90271746,          29.93747041, 29.97244906],         [29.99171911, 29.99051943, 29.92706773, ..., 29.90578739,          29.99433847, 29.94506567],         [29.99438621, 29.98798699, 29.97664488, ..., 29.98669576,          29.91296382, 29.93100249]])  Coordinates:    * latitude   (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0    * longitude  (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0    min temperature:     array([[10.0326431 , 10.07666029, 10.02795524, ..., 10.17215336,          10.00264909, 10.05387097],         [10.00355858, 10.00610942, 10.02567816, ..., 10.29100316,          10.00861792, 10.16955806],         [10.01636216, 10.02856619, 10.00389027, ..., 10.0929342 ,          10.01504103, 10.06219179],         ...,         [10.00477003, 10.0303088 , 10.04494723, ..., 10.05720692,          10.122994  , 10.04947012],         [10.00422182, 10.0211205 , 10.00183528, ..., 10.03818058,          10.02632697, 10.06722953],         [10.10994581, 10.12445222, 10.03002468, ..., 10.06937041,          10.04924046, 10.00645499]])  Coordinates:    * latitude   (latitude) float64 -90.0 -86.33 -82.65 ... 82.65 86.33 90.0    * longitude  (longitude) float64 -180.0 -172.7 -165.3 ... 165.3 172.7 180.0

And we obtained what we wanted, also in a clearly readable way.

And again, as before, to calculate the max, min, and mean values of temperatures we’ve used Pandas’ functions applied to an array.

Conclusions

In this article, we’ve shown three libraries for scientific calculation and computation.

While SymPy can be the substitute for other tools and software, giving us the possibility to use Python code to compute mathematical calculations, Dask and Xarray extend the functionalities of other libraries, helping us in situations where we may have difficulties with other most known Python libraries for data analysis and manipulation.

Federico Trotta has loved writing since he was a young boy in school, writing detective stories as class exams. Thanks to his curiosity, he discovered programming and AI. Having a burning passion for writing, he couldn't avoid starting to write about these topics, so he decided to change his career to become a Technical Writer. His purpose is to educate people on Python programming, Machine Learning, and Data Science, through writing. Find more about him at federicotrotta.com.

Original. Reposted with permission.

More On This Topic

  • 6 Cool Python Libraries That I Came Across Recently
  • Hyperparameter Optimization: 10 Top Python Libraries
  • 5 Must Try Awesome Python Data Visualization Libraries
  • Python Libraries Data Scientists Should Know in 2022
  • Introduction to Python Libraries for Data Cleaning
  • Explainable AI: 10 Python Libraries for Demystifying Your Model's Decisions

Project Indus: Tech Mahindra’s Initiative to Challenge OpenAI

Indian IT giant Tech Mahindra is working on an indigenous Large Language Model (LLM) that would have the ability to speak in many Indic languages, most notably Hindi.

Called Project Indus, the model will have the ability to speak in 40 different Indic languages, to begin with. More languages that have originated in the country will also be added subsequently.

Tech Mahindra head CP Gurnani recently took to Twitter to request speakers of these languages to contribute to the project with their expressions, vocabulary, and conversations.

Building an LLM needs a big dataset, and the scarcity of Indic language datasets is a challenge. The approach taken by the IT giant is similar to that of Bhashini, a project launched by Narendra Modi to build datasets on Indic languages.

Speakers of languages such as Dongri (Jammu & Kashmir), Kinnauri, Kangri, Chambeli, Garhwali, (Himachal), Kumaoni, Jaunsari ( Uttar Pradesh), Bhojpuri, Maithili, and Magahi ( Bihar), among others can contribute to the project.

Previously, Gurnani, responding to a Sam Altman tweet, confirmed that Tech Mahindra is building an LLM specifically for India.

The post Project Indus: Tech Mahindra’s Initiative to Challenge OpenAI appeared first on Analytics India Magazine.

The impacts of quantum computing on the future of data science

the Future of Data Science

Key takeaways

  • Quantum computing and data science can revolutionize data analysis by processing, analyzing, and extracting insights from massive datasets more efficiently.
  • Quantum bits (qubits) with superposition and entanglement capabilities enable intricate calculations and transform the limits of data analysis.
  • Data science faces challenges in efficiently processing vast datasets due to the limitations of traditional computing methods.
  • Quantum computing and data science can unleash immense promise by accelerating data analysis and decision-making processes.
  • Quantum computing has real-world applications in cybersecurity, drug discovery, finance, weather forecasting, and machine learning.
  • While the potential is undeniable, there are challenges regarding error correction, integration with classical computing, and ethical considerations.

In an era marked by exponential technological advancements, the convergence of quantum computing and data science is a pivotal point of transformation. The synergy between these two fields promises to revolutionize how we process, analyze, and extract insights from massive datasets. With quantum computing’s unique ability to tackle complex computations at speeds previously considered unattainable, the future of data science is poised for unprecedented innovation.

Understanding quantum computing

Quantum computing, an intricate branch of computation that capitalizes on the principles of quantum mechanics, is redefining the limits of computation. At its core are quantum bits or qubits, which, unlike classical bits, can exist in multiple states simultaneously thanks to superposition.

Quantum entanglement, another fundamental property, allows qubits to become interconnected, irrespective of distance, enabling intricate computations.

The current landscape of data science

Data science is a cornerstone of decision-making, predictive analytics, and pattern recognition across industries. However, processing vast amounts of data efficiently and effectively has posed challenges, with traditional computing methods struggling to keep up.

The algorithms powering data analysis, machine learning, and artificial intelligence have thrived but are constrained by the limitations of classical hardware.

The promised synergy: Quantum computing and data science

The marriage of quantum computing and data science promises to overcome these limitations and drive innovation to unprecedented levels. Quantum computing’s potential to perform complex calculations exponentially faster than classical computers presents an opportunity to accelerate data science applications like data analysis and decision-making processes.

The synergy between quantum computing and data science encompasses quantum-enhanced machine learning algorithms, more efficient optimization techniques, and innovative data clustering and dimensionality reduction approaches.

Real-world applications and case studies

The impacts of quantum computing on data science are already manifesting across various domains. Here are some notable real-world applications of quantum computing in data science:

Cybersecurity

Breaking current encryption standards. Current encryption standards, such as RSA and ECC, are based on mathematical problems that are believed to be difficult to solve for classical computers.

In 2016, a team of researchers from Google AI announced that they had used a quantum computer to break a weakened version of the RSA encryption standard. This was a significant milestone, showing that quantum computers could break current encryption standards.

Developing new, more secure encryption standards. Quantum computing could also be used to develop new, more secure encryption standards that are resistant to attack by quantum computers. These new standards are based on mathematical problems intractable for quantum computers.

In 2020, the National Institute of Standards and Technology (NIST) announced a new set of quantum-resistant encryption standards. These standards are designed to be resistant to attack by quantum computers.

Drug discovery

Simulating the behavior of molecules. Quantum computers can be used to simulate the behavior of molecules with unprecedented accuracy. This could help scientists to design new drugs that are more effective and less toxic. Quantum computers can simulate the interactions of drugs with proteins, which is a critical step in drug discovery.

The impacts of quantum computing on the future of data science

Image Credit: Matt Swayne/The Quantum Insider

In 2019, Google AI announced that it had used a quantum computer to simulate a molecule of hydrogen for the first time. This was a significant milestone, as it showed that quantum computers could be used to simulate the behavior of molecules, which is a critical task in drug discovery and materials science.

Finding new drug targets. Quantum computers could be used to find new drug targets, the molecules that drugs interact with, to produce their effects. This could help scientists develop new drugs for diseases without known treatment. For example, quantum computers could screen large libraries of molecules to find those that interact with a specific protein target.

In 2020, a team of researchers from the pharmaceutical company AstraZeneca used a quantum computer to find new drug targets for cancer. This was the first time a quantum computer had been used to find new drug targets.

Some of the best schools in health informatics, like Stanford University and Johns Hopkins University, are already using quantum computing for drug discovery in their curricula.

Finance

Portfolio optimization. Quantum computers could optimize investment portfolios by finding the best combination of assets to minimize risk and maximize return. This could help investors to make better investment decisions and to improve their returns.

In 2019, a team of University of Waterloo researchers used a quantum computer to develop a new algorithm for portfolio optimization. This algorithm found better investment portfolios than traditional algorithms and could be used to improve the returns of investment funds.

Financial trading. Quantum computers could be used to develop new financial trading strategies that are more efficient and profitable. For example, quantum computers could analyze large amounts of market data to identify trading opportunities that traditional methods would miss.

In 2020, a team of researchers from the Massachusetts Institute of Technology (MIT) used a quantum computer to develop a new algorithm for financial trading. This algorithm could identify trading opportunities that were missed by traditional methods and could be used to generate profits for financial institutions.

Weather forecasting

The impacts of quantum computing on the future of data science

Photo by Brian McGowan on Unsplash

Improved weather forecasting. Quantum computers could simulate the weather more accurately than classical computers. This could help to improve forecasts of extreme weather events, such as hurricanes and tornadoes. It could also improve climate change forecasts, which could help us mitigate its effects.

In 2019, a UC Berkeley team of researchers used a quantum computer to develop a new algorithm for weather forecasting. This algorithm produced more accurate forecasts than traditional algorithms, and it could be used to improve preparedness for extreme weather events.

Mitigation of climate change. Quantum computers could be used to develop new ways to mitigate the effects of climate change. For example, quantum computers could design new materials more efficiently, capturing and storing carbon dioxide.

In 2020, a team of researchers from the National Center for Atmospheric Research (NCAR) used a quantum computer to develop a new algorithm for weather forecasting. This algorithm was able to produce more accurate forecasts than traditional algorithms.

Machine learning

Training machine learning models more quickly and efficiently. Quantum computers could train machine learning models more quickly and efficiently than classical computers. Quantum computers can perform tasks much faster than classical computers, such as searching large datasets.

In 2019, a team of researchers from Google AI used a quantum computer to train a machine learning model to classify images of handwritten digits. This was a significant milestone, showing that quantum computers could be used to train machine learning models.

Developing new machine learning algorithms. Quantum computers could be used to develop new machine learning algorithms that are more powerful and efficient than traditional algorithms. This is because quantum computers can exploit the inherent parallelism of quantum mechanics to solve certain problems more efficiently.

In 2020, a team of researchers from the University of Toronto used a quantum computer to develop a new machine learning algorithm for natural language processing. This algorithm achieved state-of-the-art results on a natural language processing task.

Challenges and considerations

Despite the promises, challenges remain on the path to fully realizing the potential of quantum computing in data science. Current quantum computing technologies are still nascent, prone to errors, and require sophisticated error correction methods.

Integrating classical and quantum computing architectures poses significant technical hurdles, and ethical considerations loom over the implications of quantum-enhanced data analysis.

The road ahead: Future prospects and developments

The future holds immense potential for the growth of quantum computing in data science. Continued advancements in quantum hardware, coupled with novel error mitigation techniques, are expected to improve the reliability of quantum systems.

Collaborations between quantum computing and data science communities will foster innovation in algorithm development, leading to more efficient quantum machine learning models. As we explore hybrid quantum-classical data analysis pipelines, the boundaries of what we can achieve in data science will continue to expand.

Harnessing the power of quantum computing for data science

A new frontier of possibilities emerges in the interplay between quantum computing and data science. The transformational impacts of quantum computing on the future of data science are undeniable.

As we venture into this uncharted territory, researchers, scientists, and industry leaders must collaborate to harness the full potential of quantum computing to solve complex problems, redefine data analysis paradigms, and reshape decision-making across domains. The journey ahead involves innovation, exploration, and the relentless pursuit of uncovering hidden insights within vast datasets.

Cybersecurity’s Rising Significance in the World of Artificial Intelligence

Cybersecurity’s Rising Significance in the World of Artificial Intelligence August 21, 2023 by Shivani Shukla

(sdecoret/Shutterstock)

According to a 2023 business survey, 62 percent of enterprises have fully implemented artificial intelligence (AI) for cybersecurity or are exploring additional uses for the technology. With advancements in AI technologies, however, come more ways for sensitive information to be misused.

Globally, organizations are leveraging AI and implementing automated security measures into their infrastructure to reduce vulnerabilities. As AI is emerging, threats continue to take on various forms. A recent IBM report states that the average cost of a data breach is a staggering $4.45 million. The proliferation of generative AI (GAI) will likely consumerize AI-enabled automated attacks, including a level of personalization that would be difficult to detect by humans without GAI assistance.

While AI serves as a more generalized term for intelligence-based tech behavior, GAI is a subspecialty that extends the concept of AI to generate new content that spans across various modes and even combines them. The primary cause of concern within cybersecurity comes from GAI’s ability to “mutate,” which includes self-modifying code. This means that when a model-driven attack is unable to infiltrate a system, it alters its operative behavior to be successful.

The growing risk of cyberattacks coincides with the more widespread availability of AI and GAI through GPT, BARD, or the range of open-source options. It is suspected that cybercrime tools like WormGPT and PoissonGPT were developed using the open source GPT-J language model. Some of the GAI language models, particularly ChatGPT and BARD, have anti-abuse restrictions, yet the sophistication that GAI offers in devising attacks, generating new exploits, bypassing security structures, and clever prompt engineering might continue to pose a threat.

Issues like these play into the overarching problem of determining what is real and what is fake. As the lines between truth and hoax are blurred, it’s important to ensure the accuracy and credibility of GAI models in cybersecurity when detecting fraudulent information. Capitalizing on AI and GAI algorithms for protection against generated attacks from these technologies delivers a promising way forward.

Standards and Initiatives To Use AI in Cybersecurity

According to a recent Cloud Security Alliance (CSA) report, “generative AI models can be used to significantly enhance the scanning and filtering of security vulnerabilities.” In the report, the CSA demonstrates how OpenAI and large language models (LLMs) remain an effective vulnerability scanner for potential threats and risks. A primary example would be an AI scanner developed to quickly detect insecure code patterns for developers to eliminate potential holes or weaknesses before they become a significant risk.

Earlier this year, the National Institute of Standards and Technology launched the Trustworthy and Responsible AI Center which included their AI Risk Management Framework (RMF). The RMF assists AI users and developers in understanding and addressing the common risks involved with AI systems while providing best practices for reducing them. Despite the positive intentions of the RMF, the framework remains insufficient. This past June, the Biden-Harris administration announced that a group of developers will begin developing guidance for organizations to assist in assessing and tackling the risks associated with GAI.

Cyberattacks will become cheaper in the future as the entry barriers lower and these frameworks prove to be useful guiding mechanisms. Still, an increasing rate of AI/GAI-induced attacks will require developers and organizations to rapidly build and grow on these foundations.

The Benefits of GAI in Cybersecurity

With GAI reducing detection and response times to ensure that holes and vulnerabilities are efficiently patched, using GAI to prevent AI-generated attacks is inevitable. Some of the benefits of this approach include:

  • Detection and response. AI algorithms can be designed to analyze large and diverse datasets and capture behavior of users in the system to detect unusual activities. Extending that further, GAI can now generate a coordinated defense or decoy against those unusual activities in a timely manner. Infiltrations sitting in an organization’s IT systems for days, or even months, can be avoided.
  • Threat simulation and training. Models can simulate threat scenarios and generate synthetic datasets. Generated realistic cyberattack scenarios, including malware code and phishing emails, can radically improve the quality of response. Because AI and GAI learn adaptively, the scenarios are made progressively complex and difficult to resolve, building a more robust internal system. AI and GAI can operate efficiently in dynamic situations, thus supporting cybersecurity exercises intended primarily for training purposes, such as Quantum Dawn.
  • Predictive capabilities. Composite IT/IS networks of organizations require predictive capabilities for assessing the potential vulnerabilities that continuously evolve and shift over time. Consistent risk assessment and threat intelligence support and sustain proactive measures.
  • Human-machine, machine-machine collaborations. AI and GAI do not guarantee a completely automated system that excludes the need for human input. Their pattern recognition and generation capabilities might be more advanced, but organizations still require human creativity and their interventions. In this context, human-machine collaboration reduces overrides and clogged-up networks caused by false positives (AI-determined attack that isn’t really an attack), while machine-machine collaboration reduces false negatives across organizations given their strong combined pattern recognition capabilities.
  • Collaborative defense and cooperative approaches. The human-machine and machine-machine collaborations can ensure cooperative defense when implemented among disparate or competing organizations. Through collaboration, these competitors can work together defensively. Not being a zero-sum situation, this calls for cooperative game theory, an approach in which groups of entities (organizations) form “coalitions” and act as primary and independent decision-making units. By modeling various cyberattack scenarios as games, it is possible to predict the attacker’s actions and identify optimal defense strategies. This technique has been shown to support collaboration and cooperative behavior and the final result provides the foundation for cybersecurity policies and valuation. AI systems designed to cooperate with other AI models of competing organizations could provide an extremely stable cooperative equilibrium. Currently, such “coalitions” are mostly driven through information exchanges. AI-to-AI cooperation can enable more complex detection and response mechanisms.

These benefits contribute to GAI’s overall impact on cybersecurity but it is the collaborative efforts between developers and implemented AI that optimize cyber defense.

A Modern Approach to Cybersecurity

By 2027, the global market for AI-enabled cybersecurity technologies is expected to grow at a compound annual growth rate of 23.6 percent. While impossible to fully predict where generative AI and its role in cybersecurity will go from here, it’s safe to say that AI does not need to be feared or viewed as a potential threat. A modern approach to cybersecurity is centered around standardized AI modeling with the potential for continuous innovation and developments.

About the Author

Shivani Shukla specializes in operations research, statistics, and AI with several years of experience in academic and industry research. She currently serves as the director of undergraduate programs in business analytics as well as an associate professor in business analytics and IS. For more information, contact [email protected].

Related

Now Indians Can Secure Their Data On The Dark Web, All Thanks To Google

Last year, around 600,000 Indian users fell victim to stolen data that was being sold on the dark web, making India the most heavily impacted country in this regard, stated NordVPN. The mere thought of one’s personal data circulating on the dark web is unsettling. Google has stepped up its efforts to address this issue with valuable security; the Dark Web Report.

Subscribers can activate the feature by creating a profile through the Google One platform, clicking the “Set up > Start monitoring” option under the “Dark web report” section.

Announced at the Google I/O 2023, the technology giant has now launched the Dark Web Report feature for subscribers of Google One in India. This feature has been designed to let users closely monitor their personal information on the dark web, a hub for illicit activities such as the trading of sensitive personal data.

Google highlighted that crucial details like an individual’s full name, birthdate, contact number, email address, and other sensitive information could potentially be exploited for criminal purposes —- financial fraud and identity theft. Google’s system will cross-reference these comprehensive findings with the private data stored in users’ profiles. When a match is detected on the dark web, the security tool will notify users. This mechanism will allow users to take action and protect themselves against potential fraud.

Starting from March 2023, the Dark Web Report feature began its rollout to members of all Google One plans in the United States. At the latest I/O, Google’s Senior Vice President of Core Services, Jen Fitzpatrick, revealed that the Dark Web Report was initially available only to Google One subscribers in the U.S. However, the company is now expanding access to this tool in the coming weeks. This means that anyone with a Gmail account in the U.S. will be able to conduct scans to ascertain whether their Gmail address appears on the dark web. Accordingly, users will receive guidance on the necessary steps to safeguard themselves.

The post Now Indians Can Secure Their Data On The Dark Web, All Thanks To Google appeared first on Analytics India Magazine.