Data Science Hiring Process at Dream11

Dream Sports, the parent company of Dream11, is a prominent sports tech unicorn in India that houses a bunch of brands like Dream11, FanCode, and more, making it the Willy Wonka’s Chocolate Factory of sports engagement. Founded in 2008 by Harsh Jain and Bhavit Sheth, Dream Sports is headquartered exclusively in Mumbai, affectionately dubbed “The Stadium” by its team.

At the forefront, its flagship fantasy sports platform, Dream11, accommodates a massive user base of over 190 million individuals. This platform offers users the exciting opportunity to participate in fantasy versions of cricket, hockey, football, kabaddi, handball, basketball, volleyball, rugby, futsal, American football, and baseball.

And for the seamless operation of Dream Sports, data science has become a cornerstone in tackling fundamental challenges for platforms like Dream11, where user experience and engagement play pivotal roles.

AIM got in touch with Amit Sharma, chief technology officer at Dream11 (Dream Sports) to know more about their AI, ML operations, hiring strategy, work culture and more. Since 2016, Sharma has led the creation of the sports tech platform. Earlier, he spent over a decade developing complex distributed systems for major companies like Yahoo! and Netflix in California.

Dream Sports is currently looking out for a VP of Data Science in Mumbai to lead data science roadmap development, drive experimentation, propose ML solutions, mentor the team, and ensure goal alignment. Required skills include programming languages, data visualisation, machine learning, and strong interpersonal skills. Familiarity with technologies like Cloudfront, API Gateway, Python, MySQL, Kafka, Spark, and Redshift is essential.

Inside Dream11’ AI & Analytics Play

Aligned with its mission of enhancing the sports experience, Dream Sports operates akin to a “high-performing sports team” composed of “Coaches” (CXOs) and “Captains” (team leaders) who guide over 1000 “Sportans” (employees). That includes a skilled roster of engineers, business and data analysts, applied scientists and machine learning experts.

One of the primary hurdles that this team has successfully tackled through data science is the issue of personalisation, discovery, and fraud detection within its application. When it comes to user interaction, the app encompasses a multitude of features, like matches and contests, resulting in a notable mental demand. These elements are subject to change, creating challenges for users to fully grasp and decide on the most suitable choices. Dream11 was an early adopter of data science in its product development journey, even during when these technologies were relatively nascent within the tech ecosystem.

“In order to improve user experience and engagement, we have deployed over 100 AI and ML models to enable the contextual discovery of relevant features. We have personalised multiple user journeys throughout the app by analysing user cohorts and behaviours,” said Sharma.

Underpinning these offerings are sophisticated ML systems that process an extensive array of over 1000 features across numerous users. Appropriate models and data drifts have also been implemented to ensure the smooth functioning of daily operations. The team of data scientists has also developed robust systems for detecting fraud by incorporating knowledge graphs, extensive searches for similarities, and algorithms for recognising patterns. They consistently carry out tests and refine methods for personalisation. They use A/B testing to compare various user experiences and gauge their effects on user engagement.

Data scientists at Dream11 continually experiment with various personalisation strategies, employing A/B testing methodologies to compare distinct user experiences. This iterative approach helps in identifying the most effective strategies while quantifying their impact on user engagement.

Tech Stack

Dream Sports’ technological infrastructure is a blend of in-house tools and third-party solutions.

“While our tech infrastructure is a combination of in-house tools and third-party solutions, we strongly believe in developing in-house solutions to minimise costs, strengthen data privacy, and control the scalability of services without compromising the architecture,” said Sharma.

One of their in-house frameworks, known as FENCE (Fairplay Ensuring Network Chain Entity), is employed to identify and address Fairplay violations, ensuring fair competition for users.

Their primary distributed ML systems rely on Spark and Ray. Transformers are utilised for various sequential learning tasks and have demonstrated superior performance compared to other deep learning models on their datasets, which are awaiting large-scale implementation. “We are exploring applications of LLMs in the context of the Sports ecosystem and testing internal prototypes,” Sharma commented.

For forecasting and classical machine learning applications, they rely on a range of resources, including Scikit Learn, XGboost, Prophet, and Scipy. Additionally, for deep learning-based machine learning tasks, the team leverages the capabilities of Pytorch and Tensorflow, harnessing their power to create robust and advanced models.

On the front of app development, Dream11 became one of the few tech companies to fully migrate their platform to React Native, a UI software framework. Despite industry scepticism regarding the feasibility of complete React Native adoption due to its historically low success rates, Dream11 navigated and overcame the associated challenges to make this happen.

Interview Process

Dream11 places a strong emphasis on valuing skills when hiring, seeking top talent aligned with their goal of enhancing the sports experience, encapsulated by the acronym “DOPUT”: Data-Driven, Ownership, Performance, UserFirst, and Transparency.

When hiring data science professionals, the organisation prioritises cultural fit initially. Upon meeting this criterion, candidates undergo a customised hiring process that varies by role. For most data science applicants, this process includes an aptitude test, followed by progressive technical interviews covering areas such as R programming, ML, and practical mock projects. Domain-specific interviews led by team leaders provide a thorough evaluation of the candidate’s preferences and skills.

One of the common mistakes that candidates make while interviewing is sometimes they miss out on the basic foundations of ML, stats or experimentation which are extremely valued at the organisation.

“While building models is easy, making them useful is tougher. That’s where hands-on implementations become a force multiplier,” said the CTO. Prospective team members can expect a supportive work environment with access to extensive qualitative data, challenging machine learning tasks, advanced infrastructure, a motivated team, and the chance to contribute at a large scale.

Work Culture

Driven by its culture, the company fosters an open and transparent atmosphere. Certified as a Great Place to Work, Dream Sports adopts a hyper-experimentation approach known as “HEAL – Hypothesis, Experiment, Analysis, and Learning” allowing employees to experiment, embrace failure, swiftly learn, and create personalised user features.

Employees enjoy a range of special perks and benefits like ‘Learning Wallet‘, unlimited leaves, ESOP, insurance, mental wellness initiatives and more. The Learning Wallet supports diverse learning ambitions, allowing individuals to explore areas such as design or coding regardless of their primary expertise. The unlimited leave policy promotes a healthier work-life balance, including the ‘Unplugged’ feature—a unique seven-day work-free vacation opportunity. Additionally, employees enjoy fully-paid access to sports events, matches, and tournaments.

“Most importantly, besides several industry-first benefits, we offer access to the latest tech stack and prioritise building a thriving culture through various engagement activities,” concluded Sharma.

Check out their careers page here.

Read more: Data Science Hiring Process at Naukri.com

The post Data Science Hiring Process at Dream11 appeared first on Analytics India Magazine.

Musk Makes X Heaven For Creators

Elon Musk taking over Twitter was the talk of every town last year. People have been on different sides saying that it is going to make the social media platform worse while others were extremely happy with the take over. Now, the platform, after being renamed to X, is arguably one of the better platforms compared to others, and Musk is in a frenzy to make it the best that there is.

In its most recent development, X has released its latest algorithm with close to 10,000 changes in its code. Unsurprisingly, most of it is to boost creators’ content on the platform. It is probably high time to change your account to a creator one and possibly also pay the 8$ to Musk and get thousands in returns. On the other hand, Mark Zuckerberg’s Threads has seen a massive user decline of 70%.

No repetitive conversations

Firstly, X does not want you to be bombarded with the same conversations again and again on the “For You” feed. Replies are going to get more traction than if someone simply retweets a post and thus will be given more priority on the feed to keep the conversations fresh.

Moreover, these reply streams will now contain more ads. This is to boost ad revenue for creators with subscriber only content. X ultimately wants more users to subscribe and become creators on the platform. On the other hand, this also comes with an “engagement bait warning”, which will highlight if a person is simply using clickbait words to bait people into their account.

Funnily, on Threads, Mark Zuckerberg is also trying to promote ad based content and drive conversations. One would say trying too damn hard to be like Musk, but is failing miserably.

I can’t tell what’s real these day 😂
Can someone please let him know on 🧵that I’m popping by his house to fight as soon as he’s “back in town”.

— Elon Musk (@elonmusk) August 15, 2023

The best part is that now X’s algorithm does not define the worth of your content based on the likes and retweets you get for making it reach the masses. All you have to do is put up a post talking about the trending and popular topics and X will push that in the feed.

All eyes on X

Musk does not want you to leave X at all, and not even your audience. Posting links that take you out of the platform will be deboosted which includes YouTube or Instagram links.

Speaking of YouTube and Instagram, X is also going to boost posts containing rich media such as photos, gifs, or videos. Moreover, you can talk about anything on X and it will reach the audience that resonates with it, even if they are not your followers. But if you have an audience of more than 10,000, you will have to stick to the content that you are creating to get more engagement.

Not just media posts, X also pushes posts that have long form content, and not shorter content that can disappear with just a single scroll on the feed. If you ace at doing all these things and rise up on the algorithm, X will push it even further. There are now two durations for posts – one of 24 hours and the other of 48 hours. If your posts perform well for the first 24 hours, it will be pushed to the expanded duration of 48 hours for reaching a wider audience.

Adding to this, if you reply to your own posts and people interacting with it, it will have double weightage than any other replies. This will show X that you are actually engaging with the audience instead of just posting and hoping for the ad revenue.

Heaven for creators and xAI

Musk is simply turning X into an everything app and wants all creators on its platform. Meta tried to push all its Instagram creators on Threads by promoting them on the platform and keeping all the followers. But as we can clearly see, Threads is already a dying platform, X does not even try to compete with it anymore.

Arguably, the race of these social media giants is all about AI. Zuckerberg wanted to gather user data through Threads, and probably improve Meta’s AI model LLaMA. But given that people are leaving the app, Zuckerberg’s hopes of collecting the data are also leaving with them. But on the other hand, Musk is continually able to attract more people on X. He has already said that he is going to use the data from the platform to train xAI’s upcoming model.

By making the platform creator friendly and hopping onto the trend of short videos started by TikTok and continued on Instagram Reels, X wants you to ditch every other platform and hop onto it and start reeling in money quite literally. People have been posting screenshots of the thousands of dollars they have been receiving from ad revenue on the platform.

This might be the best time to become a creator on X, record a video on a trending topic, and post it to get into everyone’s “For You” feed.

The post Musk Makes X Heaven For Creators appeared first on Analytics India Magazine.

5 Best Papers Presented at SIGGRAPH 2023

Generative AI has taken the graphics world by a storm. New models, redefining the industry, seem to emerge almost every other day. But the fusions of art and technology are not new. Every year ACM SIGGRAPH (Special Interest Group on Computer Graphics and Interactive Techniques) presents an event where pixels and algorithms paint dreams. The distinguished academic organisation Association of Computing Machinery put up a show in Los Angeles from August 6 to 10.

Among the notable speakers, NVIDIA CEO Huang Jensen took the stage this year to announce several new products and research focusing on generative AI, computer graphics, and the company’s role in OpenUSD developments.

Read more: After GTC, NVIDIA Rides the Generative AI Wave at SIGGRAPH

This year marked the 50th anniversary of SIGGRAPH. The technical paper awards were newly introduced in 2022 since these papers serve as the pillar for scholarly work. Technical Papers Chair Alla Sheffer highlighted these award-winning papers and thanked the selection committee who chose the Best Papers out of the pool of hundreds.

Here are the 5 best award-winning papers from this year’s conference:

Split-Lohmann Multifocal Displays

The new paper introduces an impressive near-eye 3D display that quickly creates virtual worlds and lets users focus naturally on objects at different distances. This means you can enjoy realistic 3D videos and games like never before, feeling fully immersed.

Authors: Yingsi Qin, Wei-Yu Chen, Matthew O’Toole, Aswin C. Sankaranarayanan, Carnegie Mellon University

Differentiable Stripe Patterns for Inverse Design of Structured Surfaces

In this work, researchers presented an innovative technique called “Differentiable Stripe Patterns.” This computational method automates the design of physical surfaces with distinct stripe-shaped, dual-material arrangements.

The team has developed a tool that employs optimization through gradients to automatically generate stripe patterns. These patterns are tailored to closely match specific goals for overall mechanical performance.

Authors: Juan Sebastian Montes Maestre, Yinwei Du, Ronan Hinchet, Stelian Coros, Bernhard Thomaszewski, ETH Zürich

Globally Consistent Normal Orientation for Point Clouds by Regularizing the Winding-number Field

The researchers have put forth a smooth objective function to define the criteria for an acceptable winding-number field. This innovation allows the determination of globally consistent normal orientations, even when starting from an initial set of entirely random normals.

Authors: Rui Xu, Shandong University; Zhiyang Dou, The University of Hong Kong; Ningna Wang, The University of Texas at Dallas; Shiqing Xin, Shandong University; Shuangmin Chen, Qingdao University of Science and Technology; Mingyan Jiang, Shandong University; Xiaohu Guo, The University of Texas at Dallas; Wenping Wang, Texas A&M University; Changhe Tu, Shandong University

3D Gaussian Splatting for Real-time Radiance Field Rendering

The new technique enables real-time display of radiance fields with impressive visual quality, achieving a rendering rate of at least 30 frames per second. The researchers depict scenes using precise 3D Gaussians, facilitating efficient optimization processes.

The inclusion of visibility-aware rendering accelerates training, matching the speed of the fastest prior methods while maintaining comparable quality. Additionally, just one extra hour of training enhances the output to a state-of-the-art level of quality.

Authors: Bernhard Kerbl, Inria, Université Côte d’Azur; Georgios Kopanas, Inria, Université Côte d’Azur; Thomas Leimkuehler, Max-Planck-Institut für Informatik; George Drettakis, Inria, Université Côte d’Azur

DOC: Differentiable Optimal Control for Retargeting Motions Onto Legged Robots

The team of researchers at Disney Research have introduced a novel framework called Differentiable Optimal Control (DOC), which simplifies the calculation of analytical derivatives for optimal control and state trajectories based on user-defined parameters.

The work demonstrates its effectiveness by swiftly adapting motion capture and animation data onto a range of legged robots with differing proportions and mass distribution.

Ruben Grandia, Disney Research Imagineering; Farbod Farshidian, ETH Zürich; Espen Knoop, Disney Research Imagineering; Christian Schumacher, Disney Research Imagineering; Marco Hutter, ETH Zürich; Moritz Bächer, Disney Research Imagineering

The post 5 Best Papers Presented at SIGGRAPH 2023 appeared first on Analytics India Magazine.

DEF CON Generative AI Hacking Challenge Explored Cutting Edge of Security Vulnerabilities

AI generated image of a hacker in front of a laptop.
Image: AVC Photo Studio/Adobe Stock

OpenAI, Google, Meta and more companies put their large language models to the test on the weekend of August 12 at the DEF CON hacker conference in Las Vegas. The result is a new corpus of information shared with the White House Office of Science and Technology Policy and the Congressional AI Caucus. The Generative Red Team Challenge organized by AI Village, SeedAI and Humane Intelligence gives a clearer picture than ever before of how generative AI can be misused and what methods might need to be put in place to secure it.

Jump to:

  • Generative Red Team Challenge could influence AI security policy
  • What vulnerabilities are LLMs likely to have?
  • How to prevent LLM vulnerabilities

Generative Red Team Challenge could influence AI security policy

The Generative Red Team Challenge asked hackers to force generative AI to do exactly what it isn’t supposed to do: provide personal or dangerous information. Challenges included finding credit card information and learning how to stalk someone. The AI Village team is still working on analyzing the data that came from the event and expects to present it next month.

This challenge is the largest event of its kind and one that will allow many students to get in on the ground floor of cutting-edge hacking. It could also have a direct impact on the White House’s Office of Science and Technology Policy, with office director Arati Prabhakar working on bringing an executive order to the table based on the event’s results.

Organizers expected more than 3,000 people would participate, with each taking a 50-minute slot to try to hack a large language model chosen at random from a pre-established selection. The large language models being put to the test were built by Anthropic, Cohere, Google, Hugging Face, Meta, NVIDIA, OpenAI and Stability. Scale AI developed a scoring system.

“The diverse issues with these models will not be resolved until more people know how to red team and assess them,” said Sven Cattell, the founder of AI Village, in a press release. “Bug bounties, live hacking events and other standard community engagements in security can be modified for machine learning model-based systems.”

SEE: At Black Hat 2023, a former White House cybersecurity expert and more weighed in on the pros and cons of AI for security. (TechRepublic)

The AI Village team will use the results of the challenge to make a presentation to the United Nations next month, Rumman Chowdhury, co-founder of Humane Intelligence, an AI policy and consulting firm, and one of the organizers of the AI Village, told Axios.

That presentation will be part of the trend of continuing cooperation between the industry and the government on AI safety, such as the DARPA project AI Cyber Challenge, which was announced during the Black Hat 2023 conference. It invites participants to create AI-driven tools to solve AI security problems.

What vulnerabilities are LLMs likely to have?

Before DEF CON kicked off, AI Village consultant Gavin Klondike previewed seven vulnerabilities someone trying to create a security breach through an LLM would probably find:

  • Prompt injection.
  • Modifying the LLM parameters.
  • Inputting sensitive information that winds up on a third-party site.
  • The LLM being unable to filter sensitive information.
  • Output leading to unintended code execution.
  • Server-side output feeding directly back into the LLM.
  • The LLM lacking guardrails around sensitive information.

“LLMs are unique in that we should not only consider the input from users as untrusted, but the output of LLMs as untrusted,” he pointed out in a blog post. Enterprises can use this list of vulnerabilities to watch for potential problems.

In addition, “there’s been a bit of debate around what’s considered a vulnerability and what’s considered a feature of how LLMs operate,” Klondike said.

These features might look like bugs if a security researcher were assessing a different kind of system, he said. For example, the external endpoint could be an attack vector from either direction — a user could input malicious commands or an LLM could return code that executes in an unsecured fashion. Conversations must be stored in order for the AI to refer back to previous input, which could endanger a user’s privacy.

AI hallucinations, or falsehoods, don’t count as a vulnerability, Klondike pointed out. They aren’t dangerous to the system, though AI hallucinations are factually incorrect.

How to prevent LLM vulnerabilities

Although LLMs are still being explored, research organizations and regulators are moving quickly to create safety guidelines around them.

Daniel Rohrer, NVIDIA vice president of software security, was on-site at DEF CON and noted that the participating hackers talked about the LLMs as if each brand had a distinct personality. Anthropomorphizing aside, the model an organization chooses does matter, he said in an interview with TechRepublic.

“Choosing the right model for the right task is extremely important,” he said. For example, ChatGPT potentially brings with it some of the more questionable content found on the internet; however, if you’re working on a data science project that involves analyzing questionable content, an LLM system that can look for it might be a valuable tool.

Enterprises will likely want a more tailored system that uses only relevant information. “You have to design for the point of the system and application you’re trying to achieve,” Rohrer said.

Other common suggestions for how to secure an LLM system for enterprise use include:

  • Limit an LLM’s access to sensitive data.
  • Educate users on what data the LLM gathers and where that data is stored, including whether it is used for training.
  • Treat the LLM as if it were a user, with its own authentication/authorization controls on access to proprietary information.
  • Use the software available to keep AI on task, such as NVIDIA’s NeMo Guardrails or Colang, the language used to build NeMo Guardrails.

Finally, don’t skip the basics, Rohrer said. “For many who are deploying LLM systems, there are a lot of security practices that exist today under the cloud and cloud-based security that can be immediately applied to LLMs that in some cases have been skipped in the race to get to LLM deployment. Don’t skip those steps. We all know how to do cloud. Take those fundamental precautions to insulate your LLM systems, and you’ll go a long way to meeting a number of the usual challenges.”

Subscribe to the Cybersecurity Insider Newsletter

Strengthen your organization's IT security defenses by keeping abreast of the latest cybersecurity news, solutions, and best practices.

Delivered Tuesdays and Thursdays Sign up today

Microsoft Defender for Cloud Gets More Multicloud

The Microsoft 365 Defender logo on a computer.
Image: monticellllo/Adobe Stock

Almost 90% of enterprises use more than one public cloud provider, according to Flexera’s 2023 State of the Cloud survey. For enterprise cloud users, managing their multicloud workloads is the second biggest challenge after managing cloud costs. Microsoft Defender for Cloud aims to help with that.

Businesses can already use Microsoft Defender for Cloud to monitor security settings on AWS and Google Cloud Platform as well as Azure. Beginning August 15, 2023, businesses will also be able to identify security risks and attack paths, scan for secrets and discover sensitive data stored in Google Cloud. These cloud security posture management features were previously only available for AWS and Azure and now will apply to all three main clouds. Microsoft Defender for Cloud can even turn on best practices from several key standards for AWS, Azure and now GCP automatically.

Jump to:

  • Get a baseline using the Microsoft cloud security benchmark
  • Find vulnerabilities and predict attacks with a graph database
  • Protect cloud storage with new malware scanning
  • Where Microsoft Defender for Cloud goes next

Get a baseline using the Microsoft cloud security benchmark

“A lot of our customers are not single cloud – it’s really rare,” Microsoft VP of strategy for SIEM and XDR Raviv Tamir told TechRepublic. “Most customers go multicloud because they want to divide the risk. But then the problem is applying policy across (those clouds) in a consistent way.”

To help with that, Microsoft turned its Azure Security Benchmark into a cross-platform tool, renaming it the Microsoft cloud security benchmark. The MCSB combines relevant recommendations from the Center for Internet Security, the National Institute of Standards and Technology and the Payment Card Industry Data Security Standard or PCI-DSS, Tamir explained.

“It’s a baseline that tries to align across these three standards and take all the technical parts of it and then tell you sort of: How do you measure up vs Azure, and how do you measure up vs AWS? With the new GCP connector, we can align that also to GCP so you can get all your three hyperscale clouds in one go.”

While GCP benchmark coverage is in public preview, you can add your GCP environment to Microsoft Defender for Cloud and get free resource monitoring with those best practices automatically enabled.

“We do the central baseline, because you can have a policy, but even translating that into those controls is complex, because what does it mean (for each cloud)? So we try to take that load off you, and we are doing the policy centrally.”

Find vulnerabilities and predict attacks with a graph database

Microsoft has long maintained that defenders think in terms of the lists of their assets, while attackers think in graphs of how systems are connected so they can jump from the initial breach into more valuable services.

With the GCP connector, Microsoft Defender for Cloud can build a graph database of everything you have in the cloud across AWS, Azure and Google Cloud. Then, you can explore that to understand what data you have and where you can be attacked. Tamir calls this a “data aware security posture” that can find and protect sensitive data.

He added, “We’re taking all the data that we can scrape off your GCP buckets, and aligning them onto the assets in the graph. All your assets, inventory, vulnerabilities and configurations are now hooked on the assets in the graph and connected.”

The data is scanned for sensitive data (e.g., credit card details, social security numbers and any custom information types you’ve defined in Microsoft Purview) that you wouldn’t want to see lost in a data breach. “We’re using the data tagging that comes from the DLP (data loss prevention) side of the house so you can tag using the same policies,” he explained. “As we go through this data, we also scan and tag everything we see. And yet again, that’s another great layer that gets added to the graph.”

Your cloud servers and Defender Vulnerability Management containers, if you have them, (Figure A) are also scanned for secrets (i.e., credentials such as SSH private keys, access keys and SQL connection strings) that you shouldn’t store in the cloud, as well as known vulnerabilities. That won’t affect the performance of those workloads. “To make that graph complete, we also do agentless scanning because we need to analyze all the logs and all the data that comes in to enrich the graph,” Tamir explained.

Figure A

The information in the security graph shows that you have a container with serious vulnerabilities running a Kubernetes pod that can be accessed from the internet.
The information in the security graph shows that you have a container with serious vulnerabilities running a Kubernetes pod that can be accessed from the internet. Image: Microsoft

He added, “That all goes into a database, and you can query that database. We’re giving you the nice interconnected view of everything that you have.”

Putting the different pieces of information together like this helps you assess how serious a problem is. If you have a vulnerability in a virtual machine that has access to a service like Azure Key Vault, you’ll want to prioritize fixing that. Similarly, if the vulnerability is in a system that doesn’t have access to credentials but does have sensitive data, you should also care about it.

Attack path analysis

Exploring the graph as a defender lets you see all your resources the way an attacker would, but not everyone knows what to look for, so Microsoft is building tools to help security teams prioritize what needs fixing — the first is attack path analysis (Figure B).

Figure B

The attack path analysis shows that an attacker could get into a VM that's exposed to the internet because it has high severity vulnerabilities and go through several other systems to get to a storage bucket.
The attack path analysis shows that an attacker could get into a VM that’s exposed to the internet because it has high severity vulnerabilities and go through several other systems to get to a storage bucket. Image: Microsoft

“Without doing any probing and just based on all the data that we accumulate in the graph, this is telling you the sets of possible attacks, and then we show you what would be the impact of this attack because you have a vulnerable set of VMs that have access to, say, key storage. “We can tell you what the potential outcome is, which helps you focus on the more important things,” Tamir pointed out. “And in the future, this will be a basis for us being able to tell where the attack is going, not just where it is right now.”

Protect cloud storage with new malware scanning

You don’t just want to stop attackers getting into your cloud storage — you also want to stop them from sneaking malware into your storage.

Traditionally, storage at rest doesn’t get scanned for malware because the assumption is the malware can’t execute when it’s sitting in a storage bucket – and if it does end up on an endpoint where it can be run, the defenses there will catch it. Microsoft Defender for Cloud can protect a wide range of devices, but that’s not enough to keep you safe, Tamir warned.

One customer allows their users to upload information for support agents to look at to help them. Tamir noted: “That information is immediately viewed by an agent, so the time that spends in the bucket storage before it actually gets consumed is really short, and the malware authors use it as a way of distributing malware. And in this case, it was ransomware.”

Other organizations have compliance rules like NIST and SWIFT for their data governance that mean they have to scan all data, but they don’t do it in real time. Tamir said, “They’ve been lazy scanning, and they have to set up all sorts of their own infrastructure and pull the data into like a VM and then scan it and then try to put it back. We can do that for them: We can do it quicker, we can do it without the hit of performance, and we can actually do it on upload.”

The new Malware Scanning in Defender for Storage is for Azure Blob storage only and will be available from September 1 as an optional extra for Defender for Storage, costing $0.15 per GB of data scanned.

Tamir said, “It’s not just file scanning, it’s not just hash, it’s not just IOCs (Indicators of Compromises); we’re actually doing polymorphic scanning.” And while the malware scanning is automated and delivered as a service rather than infrastructure you have to manage, you can still choose what happens when malware is detected. He added, “You can decide whether you just want us to tell you that it’s bad, or you want us to actually take an action, or you want to take the action somewhere else.”

Where Microsoft Defender for Cloud goes next

Defender for Storage

The next step for malware scanning in Defender for Storage will be scanning files more frequently, not just when they’re uploaded, to look for malware identified since then. Tamir suggested, “There are more polymorphic chains of malware that we discover every day.” The scale of cloud storage makes that a challenge. “These are really huge buckets; (if you’re) scanning them periodically, you will never get to the end, so we need to find a smart way of scanning them, whether it’s on access or some other trigger.”

How AI and automation could help

There are also a lot more opportunities to use the information in the graph that Defender for Cloud builds to protect customers, making it easier to avoid mistakes in the security and configuration settings that protect you, and do more.

“In general, in Microsoft (products) we have a lot of places where you set policies and not enough coordination between them,” Tamir noted. “If I set DLP policies, I want to set them centrally in one place – maybe it’s Microsoft Purview. And then I want that to move across all of my assets, and every enforcement point that I have should yield to that policy rather than me having to go and set those policies individually.”

Not only does he want applying those policies to take a lot less work, but instead of manually checking and applying the right security baseline, automation and AI could do more of the work of setting the right policies in the first place, he suggested.

Tamir added, “With cloud, people started the right way, saying instead of dealing with things post breach, let’s set the configurations right to begin with – and then we found out that the configuration problem is just as big!”

“This whole notion of shift left that everybody’s talking about; we still have a lot of manual steps in it – a lot of reasoning people need to do,” Tamir said. “I think there’s a revolution that must come in two parts. One, there needs to be more automation controlled stuff than human controlled stuff; automation will be really critical here because the information density is impossible.” The second step will be to add AI to automation. Tamir stated, “I think it will be a really good challenge for things like generative AI, for reasoning over things that are complex in the sense that they seem the same, but they’re not necessarily the same.”

Tamir concluded, “When people ask me, can I take my sets of compliance that are overlapping, and then tell me what the common denominator is for all of them, and what should I do to do that? I think that’s a problem that is primed well for tools like generative AI.”

Subscribe to the Microsoft Weekly Newsletter

Be your company's Microsoft insider by reading these Windows and Office tips, tricks, and cheat sheets.

Delivered Mondays and Wednesdays Sign up today

Amazon now using generative AI to summarize customer reviews

Amazon customer reading AI-generated summaries of customer reviews

Can you trust that the AI is accurately reflecting the many reviews that it summarizes? And how does the AI avoid or exclude fake and false reviews?

You're shopping for an item on Amazon and want to see what other buyers think of it. But it's a popular product that's triggered hundreds or even thousands of individual reviews. Instead of reading them all, you'll now be able to view an AI-generated summary designed to encapsulate the opinions of all those shoppers.

On Monday, Amazon announced the official rollout of its new AI product review option. Now available to a cross-section of mobile shoppers in the US, the generative AI technology will serve up a short paragraph on a product's detail page highlighting the core features and summarizing the overall customer sentiment. The AI-generated highlights will also let you choose a specific product attribute, such as "ease of use" or "reliability" to see excerpts of reviews mentioning that factor.

Also: Is Temu legit? What to know about this shopping app before you place an order

The company started testing the new feature in June but has now rolled it out on a more widespread basis. If the option is available on a product you're viewing, a customer reviews section will appear above the actual reviews with a short summary and a notice that it's AI-generated from the text of customer reviews. Selecting one of the attributes below the paragraph displays a summary focused on that factor and brief snippets from several customer reviews.

Amazon isn't the only online retailer to offer AI-generated summaries of product reviews. Last week, fellow vendor Newegg announced that it's been using ChatGPT to condense customer reviews into small snippets known as "Review Bytes" accompanied by longer paragraphs created by AI. For now, the option is accessible only on the desktop version of the Newegg website and only for products that have a certain minimum number of reviews.

But using AI to generate review summaries triggers a couple of questions. First, can you trust that the AI is accurately reflecting the many reviews that it summarizes? Second, how does the AI avoid or exclude fake and false reviews?

To address the first question, Amazon said only that it continues to test and fine-tune its AI models to improve the experience. As for the second question, phony reviews are a problem that has plagued Amazon and its customers for years. And it's an issue that the company keeps trying to combat.

Amazon said that it continues to invest in resources to stop fake reviews before they pop up. To detect fraudulent reviews and unusual behavior, the company uses machine learning models to analyze thousands of data points, including relationships to other accounts, sign-in activity, and review history. Further, investigators use fraud-detection tools to analyze and prevent fake reviews. And for the new AI summaries, Amazon said that it incorporates only trusted reviews from verified purchases.

Also: The best Amazon tech deals: Save big on laptops, headphones, and more

The company has also filed lawsuits against fake review brokers who contact customers through websites, social media, and messaging services and push them to write fake reviews in exchange for money or free products. In 2022, Amazon blocked more than 200 million suspected fake reviews from appearing online.

ZDNET Recommends

What to Know About StableCode: The AI Code Generator From Stability AI

In today's rapidly evolving tech landscape, AI-powered solutions are playing a crucial role in transforming industries. One such game-changer is StableCode, developed by Stability AI. This revolutionary tool is not just another code generator but a sophisticated blend of technology designed to make coding more accessible, efficient, and innovative. Let's dive deep into understanding what makes StableCode stand out.

The Triad of StableCode's Power

StableCode's efficiency stems from its foundation based on three distinct but interconnected models: the base model, the instruction model, and the long-context window model.

1. Base Model: The Cornerstone

The base model, a product of intense training on the stack-dataset (v1.2) from BigCode, is truly the bedrock of StableCode. Housing a colossal 560 billion tokens of code gathered from varied sources such as GitHub, Stack Overflow, and Kaggle, this model possesses an intricate understanding of a wide array of programming languages like Python, Java, C, JavaScript, and many more. Its constant evolution ensures that it continually refines its code generation capabilities, making it a reliable assistant for developers.

2. Instruction Model: The Guide

Built atop the base model, the instruction model is the guiding light for complex problem-solving. It has honed its skills through training on approximately 120,000 code instruction/response pairs in the Alpaca format. This enables the model to convert natural language instructions into actionable code. Whether you instruct it to “create a Python function that calculates the Fibonacci sequence” or “design an API endpoint in Go”, the instruction model is equipped to deliver.

3. Long-Context Window Model: The Expanded Horizon

Touted as StableCode's most advanced feature, the long-context window model can juggle vast chunks of code, nearly 2-4 times more than some of its contemporaries. With a context window that spans 16,000 tokens, developers can seamlessly review or edit the equivalent of up to five average-sized Python files concurrently. This ensures that while working on expansive projects, developers never lose the narrative of their code.

How to Use StableCode

Amid the rise of AI-driven tools, StableCode stands out as a coding-specific LLM, offering a unique experience that melds coding efficiency with advanced AI capabilities. If you're keen on navigating this transformative tool, here's a simple guide to kick-start your StableCode journey.

  • Integration with Google Collab: For those looking to get their hands dirty right away, StableCode's seamless integration with Google Collab is great. This integration not only simplifies the user experience but also offers an interactive platform to experiment with, whether you're aiming to generate intricate code snippets or merely diving into basic tasks like executing a binary search in Python.
  • Utilizing the Hugging Face Model Card: To further streamline the usage process, StableCode is accessible through the Hugging Face model card. This accessibility means that introducing StableCode into a web-based UI becomes an effortless endeavor. Regardless of the complexity of your coding tasks, StableCode is right there to offer assistance, optimization, and more.

Developer's Note: “While StableCode brings groundbreaking innovations to the coding world, it's essential to employ this model judiciously. We urge users to refrain from using StableCode for any illicit content creation, promoting unlawful actions, or engaging in activities posing significant physical or economic threats.”

Image: Stability AI

StableCode 16K

As we move further into the realm of AI-driven tools, the demand for broader context and more efficient coding solutions becomes evident. Enter the StableCode 16K—a revolutionary model designed to cater to these exact needs.

A Window to Expansive Context

While the foundational StableCode offers a 4K context window, Stability AI recognized the value of a larger coding lens. The StableCode 16K, with its impressive 16,000-token context window, stands tall among its counterparts. This expansive window ensures that the model can simultaneously view a significantly more extensive codebase, enhancing its capacity to tackle tasks and refine code generation.

Viewing and Editing Proficiency

Stability AI's commitment to creating a user-centric product shines through with the 16K model's capabilities. Imagine being able to access or modify the equivalent of five medium-sized Python files simultaneously. This feature not only underscores the model's robustness but serves as a boon for beginners who can benefit from its holistic code view, thereby aiding in better comprehension and task execution.

Single or Multi-Line Code Generation

The beauty of StableCode, be it the foundational or the 16K variant, lies in its versatility. Both models are proficient in generating and completing code, irrespective of whether it's a single line or multiple lines, making it a go-to tool for a wide range of coding needs.

Stability AI's mission transcends beyond mere coding assistance.

In their words: “People of every background will soon be able to create code to solve their everyday problems and improve their lives using AI, and we’d like to help make this happen.” This sentiment reaffirms the company's drive to democratize tech, ensuring that coding and AI solutions are within everyone's grasp, irrespective of their background.

StableCode vs. The Rest

While StableCode is not the maiden AI tool aiming to generate code from natural language, it certainly has carved a niche for itself. When benchmarked against tools like GitHub Copilot and SourceAI, StableCode displayed superior accuracy and efficiency, making it a preferred choice for many.

Why Choose StableCode?

In an era where multiple AI tools claim to simplify the coding experience, the differentiation often lies in the details. StableCode, with its bespoke features and user-centric approach, certainly offers compelling reasons to be the tool of choice for many. Here's a more in-depth look at what makes StableCode a favorable choice for developers, learners, and enthusiasts alike.

1. Elevated Productivity for the Modern Developer

  • Bug Detection: One of the perennial challenges in coding is the detection and resolution of bugs. StableCode's advanced algorithms proactively identify potential errors, saving hours that developers might otherwise spend in debugging.
  • Refactoring Assistance: Code optimization is essential for enhancing performance and maintainability. StableCode assists in refactoring, suggesting cleaner and more efficient ways to structure the code. This not only makes the codebase more manageable but also improves its overall quality.
  • Auto-completion: In the fast-paced world of coding, every second count. StableCode's auto-completion feature accelerates the coding process, suggesting contextually relevant code snippets as developers type. This not only speeds up development but also ensures that the code adheres to best practices.

2. A Learning Companion for Every Step of Your Journey

StableCode isn't just for the experts. Whether you're a beginner taking your first steps into the world of coding or an intermediate developer exploring new territories, StableCode is right beside you. Its intuitive interface provides:

  • Guided Insights: StableCode offers proactive suggestions and insights, making the learning curve smoother. For those exploring new languages or frameworks, these insights can be invaluable.
  • Solutions to Challenges: Every coder, regardless of their expertise level, occasionally encounters challenges. StableCode offers potential solutions, serving as a reliable assistant whenever you're stuck or need a fresh perspective.

3. A Commitment to Accessibility

In the digital age, accessibility is paramount. StableCode's commitment to democratizing coding knowledge is evident in its model:

  • Freemium Model: StableCode is available free of charge for personal and academic pursuits. This means students, hobbyists, or anyone curious about coding can access state-of-the-art AI-driven coding assistance without any financial barriers.
  • Ubiquitous Access: With just a web browser, anyone can start their coding journey with StableCode. There's no need for elaborate setups or expensive infrastructure, making it a true testament to bridging the digital divide.

The Future of Coding with StableCode

In the annals of technological evolution, there comes a time when a particular invention or innovation manages to redefine the paradigms. StableCode, with its impressive array of capabilities and forward-looking vision, appears poised to be one such disruptor in the domain of coding. But what makes this development truly exhilarating is not just its technical prowess but the ethos with which it has been created.

StableCode is more than just a tool; it's a vision of a more inclusive, more efficient, and more accessible coding future. It's an embodiment of Stability AI’s aspiration to bridge the digital divide, democratize technological know-how, and empower every individual, irrespective of their background, to harness the magic of coding. This isn’t merely about writing lines of code; it’s about granting the power to create, innovate, and make a difference using technology.

As we stand at this intersection of AI and coding, one thing is crystal clear: The journey ahead is full of potential. With tools like StableCode leading the way, the future for budding developers, experienced programmers, and every tech enthusiast looks brighter than ever. We're not just witnessing a transformation in how we code but potentially in how we think, learn, and create. The future beckons, and with StableCode, it seems we're more than ready for it.

Challenges and solutions in Big Data management

challenges_of_big_data

Big Data Management has become a pivotal part of modern business, influencing decisions, shaping strategies, and offering unparalleled insights. With the exponential growth of data from myriad sources, managing it effectively is more critical than ever. However, big data’s sheer volume, variety, and velocity present a unique set of challenges. These challenges range from integration and quality control to security and performance, requiring robust solutions and innovative approaches.

This post explores the complex landscape of extensive data management, delving into the critical issues organizations face and the solutions that can overcome them. By understanding these challenges and embracing the tools and methodologies that can address them, businesses can unlock the true potential of big data, transforming raw information into actionable intelligence.

The explosion of Big Data: A brief overview

In recent years, the world has witnessed an unprecedented amount of data generated. This phenomenon, known as Big Data, encompasses Information from various sources, such as social media platforms, IoT (Internet of Things) devices, e-commerce websites, and more. It is not just the staggering volume but also the variety – structured, unstructured, semi-structured – and the velocity at which it is created and processed.

The role of big data has expanded across various industries, including healthcare, finance, retail, and transportation. In healthcare, it is used for personalized medicine and predictive analytics; finance, fraud detection, and risk management; retail, customer behavior analysis, and inventory optimization. The applications are limitless.

However, the sheer scale of big data brings complexities and challenges in management. The need for efficient storage solutions, integration of data from disparate sources, ensuring quality, and real-time processing demands innovative and robust management strategies.

Challenges in Big Data management

Data integration and quality control

Integrating several types of data from diverse sources is a complex challenge. This includes merging structured data from traditional databases with unstructured or semi-structured data from emails, social media, etc. Ensuring consistency and quality becomes an uphill task, requiring sophisticated algorithms and manual oversight. Inferior quality data can lead to inaccurate analyses and misguided decision-making, making quality control paramount.

Data security and compliance

As data volumes grow, so do the risks related to data security. Protecting sensitive Information, customer details, or intellectual property, is crucial. Compliance with regulatory requirements like GDPR (General Data Protection Regulation) adds another layer of complexity. Not meeting these standards could lead to legal repercussions and harm one’s reputation.

Performance and scalability

Handling large datasets requires robust storage solutions and high-speed processing capabilities. Traditional systems may falter under the sheer load of big data, leading to performance bottlenecks. Scalable solutions that can adapt to fluctuating data volumes are essential to prevent system failures and maintain efficiency.

The complexity of data preparation

Data preparation involves cleaning, transforming, and enriching data to make it suitable for analysis. This can be time-consuming and labor-intensive, often taking up to 80% of the time in a data project. The complexity is compounded when dealing with big data, as inconsistencies, missing values, and errors become more prevalent.

Real-time analysis and processing

Big data is often most valuable when analyzed in real time. Whether tracking stock market trends or monitoring health conditions, the ability to process and analyze data as it is generated is vital. Traditional batch processing methods may not suffice, necessitating the development of real-time analytics capabilities.

Skill gap and resource constraints

Extensive data management requires specialized skills and knowledge. The market has a noticeable skill gap, with a need for more professionals with expertise in big data technologies. Additionally, small, and medium-sized businesses may find the costs associated with extensive data management prohibitive.

Privacy and ethical considerations

With the collection and analysis of personal information, privacy concerns arise. Ethical considerations around consent, transparency, and the potential misuse of data must be addressed, requiring clear policies and adherence to ethical principles.

Solutions to Big Data management challenges

Data integration and quality control

  • Solution: Utilizing robust data integration platforms and employing data wrangling tools can automate cleaning and transforming data. Machine learning algorithms can detect inconsistencies while continuous monitoring maintains quality.

Data security and compliance

  • Solution: Implementing advanced encryption methods, multi-factor authentication, and strict access controls can enhance data security. Regular audits and alignment with regulatory frameworks ensure compliance with legal standards.

Performance and scalability

  • Solution: Leveraging cloud storage solutions and distributed computing frameworks allows organizations to scale according to data volumes. Virtualization and containerization technologies can enhance performance, making the system more responsive and resilient.

The complexity of data preparation

  • Solution: Automated data preparation tools can streamline the process, reducing the manual effort required. Devices equipped with AI (Artificial Intelligence) can learn from human inputs and progressively improve the efficiency of data cleaning and transformation.

Real-time analysis and processing

  • Solution: Real-time data processing engines and in-memory computing enable instant analysis and response. By integrating these technologies into their existing systems, businesses can take advantage of opportunities or minimize risks as they arise.

Skill gap and resource constraints

  • Solution: Investing in training and development can bridge the skill gap while outsourcing specific functions to specialized service providers can alleviate resource constraints. Collaboration with educational institutions and industry partners can foster skill development within the community.

Privacy and ethical considerations

  • Solution: Developing clear privacy policies and ethical guidelines and transparent communication with stakeholders can build trust. Regular reviews and ethical oversight committees can ensure ongoing alignment with societal norms and values.

Leveraging analytics and visualization tools

  • Solution: Analytics platforms and visualization tools can transform raw data into actionable insights. They allow businesses to identify trends, uncover hidden patterns, and make data-driven decisions. Custom dashboards can be created to cater to different stakeholders, ensuring that the Information is user-friendly.

Implementing distributed systems

  • Solution: Distributed systems like Hadoop and Spark allow for parallel processing, handling large data sets efficiently. They provide fault tolerance and can be scaled up or down according to needs, offering flexibility and cost-effectiveness.

Cloud-based solutions

  • Solution: Cloud providers offer scalable, flexible, and secure solutions for extensive data management. They provide the infrastructure and services needed, allowing organizations to focus on extracting value from the data rather than managing the underlying technology.

Machine learning and AI integration

  • Solution: Machine learning and AI can automate many processes within extensive data management, from predictive analytics to anomaly detection. They can add intelligence to the system, adapting to changing conditions and providing personalized solutions.

Incorporating data governance framework

  • Solution: A well-defined data governance framework can guide data management within an organization. It ensures that data is handled consistently, responsibly, and transparently, aligning with both internal policies and external regulations.

Conclusion

In the digital transformation era, extensive data management has emerged as both an opportunity and a challenge for businesses. The complex landscape of data integration, security, scalability, and ethical considerations requires a comprehensive approach to harness its true potential. Organizations can overcome hurdles by employing innovative solutions, including Data Wrangling tools, and maximize the value derived from vast data resources. Embracing automation, cloud-based technologies, AI, and robust governance frameworks alleviates the complexity and propels businesses toward growth and sustainability. The future of extensive data management lies in strategic adaptation and continual innovation, shaping a new paradigm in the data-centric world.

Google is beefing up AI-powered search to help you better understand the results

google Search AI screenshot

Google is enhancing the generative AI capabilities for its search tool to help you make sense of the often complex information you find on the web.

In a blog post published Tuesday, the search giant highlighted three new features already available or soon to arrive in its Search Generative Experience, or SGE. Currently available as a Google Labs experiment, SGE brings AI to the company's traditional search engine with a summary, sources, and follow-up questions related to your topic.

Also: You can build your own AI chatbot with this drag-and-drop tool

First on the list is a way to see definitions of words in the search summary. Sometimes when you search for information on the web, the results contain words and terms that you might not fully understand or that you wish to explore further.

In this context, SGE will soon offer AI-generated definitions for terms related to science, history, economics, and other fields. After the update rolls out, you'll be able to hover over certain words in the summary to see a definition of them or view related images about them.

Next is a feature dubbed "SGE while browsing." Run a search on a certain topic and you may find a bunch of lengthy or complex web pages and articles that aren't easy to digest. To help you make sense of them, this new feature will serve up an AI-generated list of key points for supported articles. Links will also be available to take you to a specific spot in the article related to the information you're seeking. Plus, an explore section will generate questions that the article answers with links to the relevant sections.

Also: How to write better ChatGPT prompts for the best generative AI results

To avoid getting into hot water with publishers, Google has set up SGE while browsing so that it works only with articles freely available on the web. That means it won't provide access to articles that are paywalled. Publishers can designate specific articles as free or paywalled to make sure only certain content is included in this type of search.

SGE while browsing is currently available in the Google app for iOS and Android and will reach Chrome on the desktop in a few days.

Next up is a new feature aimed at programmers. To help people learn more about coding, SGE has added new ways to understand and debug generated code. With this latest update, specific strings of code that appear in AI-generated summaries will be color-coded, while syntax will be highlighted. The goal is to help programmers more easily spot and identify such elements as keywords, comments, and strings.

Also: Low-code and no-code: Meant for citizen developers, but embraced by IT

To take advantage of the new features, you'll need to sign up for Google Labs if you haven't already done so. Once you're in, make sure you're using the latest version of Chrome and head to the Search Labs page. Turn on the switch for "SGE, generative AI in Search" to incorporate AI in your searches. When Chrome is ready for SGE while browsing, you'll be able to turn on that feature from here as well.

More on AI tools

Big Tech and Generative AI: Will Big Tech Control Generative AI?

Satya Nadella, Tim Cook, Mark Zukerberg and Suder Pichai. Big Tech CEO Generatve AI

Generative AI is redefining the dynamics of human-computer interaction, emerging as a technological powerhouse that could establish itself as a standalone platform.

While the integration of AI into our daily lives was gradual up to 2021, tools such as ChatGPT have struck a deeper chord with global audiences, thanks to their vast utility in communication and creative domains.

The world of Generative AI, marked further breakthroughs like LLama 2, GitHub Copilot, and Stable Diffusion, is revolutionizing not just technology, but economies as well. Big Tech companies, recognizing the groundbreaking potential, have been pouring capital into this domain.

Generative AI Economy growth projection

Generative AI Market Size Projection (Billion $)

The immense growth potential of generative AI is further validated by a recent report from Precedence Research. The global generative AI market was valued at a substantial USD 10.79 billion in 2022 and is expected to reach approximately USD 118.06 billion by 2032 with a 27% CAGR.

The below plot visualizes the monthly stock prices of five major tech companies: Microsoft (MSFT), Apple (AAPL), Alphabet (GOOGL), Amazon (AMZN), and Meta (META) from June 2022 to August 2023.

While it's tempting to draw a direct link between these significant generative AI events and the stock prices of big tech companies, it's crucial to understand the stock market's multifaceted nature. Numerous factors—from global economic trends to geopolitical situations can influence stock prices.

However, one can't overlook the prominence generative AI has gained and how its milestones could have possibly played a role in investor sentiment and decision-making. The correlation between major AI events like

  • Event 1: ChatGPT (December 2022)
  • Event 2: Google Bard (February 2023)
  • Event 3: Meta Llama 2 (July 2023)

and stock price movements during this period suggest that investors are keenly watching the AI space.

Noteworthy mentions include OpenAI's rapid progression from ChatGPT to the more advanced GPT-4 and Anthropic's AI Claude 2, which showcased remarkable processing enhancements in a short span. Even Elon Musk has ventured further into the AI realm by founding a new AI-focused company named X.AI. As revealed on the X.AI website, the compact yet formidable team of 12 is set on a mission to “understand the true nature of the universe“.

Given the money flow in this industry, it's evident that Big Tech recognizes the potential of Generative AI and is actively seeking to shape its trajectory.

The Triad of Big Tech Dominance in Generative AI: Data, Power, and Ecosystem

There are several reasons to believe that Big Tech could exert significant influence over Generative AI:

1. Data

Data is the bedrock of AI. Companies that can access vast and varied datasets have a clear advantage in AI product development. This “Data Advantage” is glaringly evident in Big Tech's strategic moves. With billions of users, these tech giants have effectively turned data acquisition into a virtuous cycle: more data leads to better products, which in turn attracts more users and even more data.

2. Computing Power

Beyond data, deploying advanced AI models requires immense computing power. The hardware and infrastructure required to train, fine-tune, and deploy these models are not only costly but also necessitate specialized knowledge and skills. This “Computing Power Advantage” ensures that while AI startups are emerging everywhere, most remain dependent on Big Tech's infrastructure. These startups often become acquisition targets, further amplifying the industry's consolidation.

3. Ecosystem Control

One of the notable capabilities of Big Tech is its ability to create integrated ecosystems that extend its reach. From search engines to smart devices, cloud platforms to e-commerce, their services are often interconnected. This interconnectedness facilitates the seamless integration of AI applications. For generative AI, this means a direct path to users across multiple touchpoints.

Take the example of “Midjourney,” which at present provides the best high-quality AI images commercially, yet is only accessible through Discord. Being bound to a single access point limits the startup's reach, especially when compared to products or services embedded within the vast ecosystems of Big Tech companies.

The Current Landscape of Big Tech and Generative AI

Generative AI is the talk of the town, transforming the technology landscape with potential that's as exciting as it is boundless. Both giants of the tech industry and emerging startups are making significant strides in this space, a clear indication that generative AI is more than just a buzzword. It’s shaping up to be the next frontier in tech innovation. Let’s delve deeper into what some of the industry leaders are up to.

Meta

Meta has its sights set on two major areas: Recommendations/Ranking and Generative models. The immense growth in organic engagement on platforms like Instagram, powered by AI recommendations, showcases the prowess of AI in enhancing user experience.

In contrast to competitors like Google and OpenAI who maintain proprietary stances on their AI models, Meta's open-source initiative represents a bold stand against restrictive tech practices. The underlying philosophy is voiced by CEO Mark Zuckerberg, who emphasizes the pivotal role of open-source software in propelling innovation. Llama 2‘s open-source model stands as an invitation to global developers, granting them access to iterate and innovate atop this foundation.

Other recent innovations from Meta include:

  1. Music & Audio: According to a recent article, Meta has introduced “Audiocraft”, a generative AI designed specifically for music and audio. This could revolutionize the way creators produce and modify music, making the process more intuitive and expansive.
  2. Text & Images: Meta also launched CM3LEON, an AI capable of generating text and images seamlessly. The implications for content creators and advertisers could be game-changing.
  3. Integration with Social Platforms: Not limiting generative AI to standalone projects, Meta is strategically integrating these technologies into its platforms such as WhatsApp, Messenger, and Instagram. This could herald a new era of user experience on these platforms, from customized content generation to enhanced interactivity.

Microsoft

Ever since the groundbreaking acquisition of OpenAI, Microsoft has been relentless in its pursuit of Generative AI dominance.

Their partnership has birthed innovations like the Azure OpenAI service, enhancing the capabilities of Microsoft's cloud offerings. This fusion is further exemplified by the introduction of Github Copilot, showcasing the profound impact of AI on coding and development.

Yet, it's in consumer-centric services where Microsoft’s AI prowess becomes especially tangible. AI-enhanced features in Bing and Edge, such as conversational AI chatbots for search queries and content generation, have elevated user interactions with the digital realm.

Their latest unveilings, Bing Chat Enterprise, and Microsoft 365 Copilot, signal a bold step towards transforming workplace productivity and collaboration

Amazon

Amazon, not one to be left behind, has its own story to tell in the world of AI. In a recent earnings call, Amazon CEO Andy Jassy revealed that “every single one” of Amazon's business sectors is deeply engaged with “multiple generative AI initiatives.” Amazon’s cloud offering AWS, has introduced tools specifically aimed at building with Generative AI.

Amazon's Alexa AI is shifting from supervised learning to a new paradigm of generalizable intelligence, reducing its reliance on human-annotated data. This move has birthed the “Alexa Teacher Models” (AlexaTM), large-scale multilingual systems inspired by OpenAI's GPT-3. Unlike most models, the AlexaTM 20B uses a unique sequence-to-sequence encoder-decoder design.

Google

During its I/O conference on May 2023, Google repeatedly emphasized its transition into an ‘AI-first' company, to the point where it became a meme. With a slew of announcements, the tech giant is not just aiming to catch up with its peers but to pioneer new avenues in AI.

Their answer to ChatGPT, the ‘Bard‘ powered by their Language Model for Dialogue Application (LaMDA), showcases their ambitions. Sundar Pichai’s vision for Bard is not just as a chatbot but a tool that can tap into the vast information reservoir of the web and provide intelligent, creative responses to users.

Apple

Apple, known for its close-guarded strategies, has been relatively silent about its specific plans in the AI arena. However, considering its historical emphasis on user experience and innovation, the tech community is eagerly waiting for Apple’s next big move. Given the comments from Tim Cook, it’s evident that AI holds importance in Apple's roadmap.

According to a Bloomberg report, Apple is gearing up to launch AJAX and Apple GPT. These AI tools are seen as Apple's counter to offerings from OpenAI and Google, signaling a heated competition ahead.

A clear testament to Apple's commitment to generative AI is its recent job listing for a Generative AI Applied Researcher. Apple is not just investing in technology but also in talent, ensuring they remain at the forefront of AI research and application.

Emerging Stars in Generative AI

Despite the firm grip of big tech on generative AI, there are startups that are not just surviving but thriving, offering innovative solutions and challenging the status quo. Their unique propositions, deep-rooted commitment to innovation, and community-centric approach underscore the vast potential and adaptability of the AI sector.

Hugging Face stands out as a frontrunner, bolstered by its emphasis on community-driven AI. Valued at approximately $2 billion, this entity offers open-source AI model development, fostering a sense of inclusiveness and collective growth within the AI community.

Stability AI has emerged as an influential player in the realm of AI-driven visual arts. Their signature offering, Stable Diffusion, translates textual inputs into images. With a valuation hovering around the $1 billion mark and operating out of London, Stability AI's recent exponential search growth attests to its rising influence. DreamStudio, one of its marquee platforms, empowers users to harness AI's might for crafting unique designs. Stability AI's emphasis on open-source tools resonates with its commitment to democratizing generative AI access.

Anthropic, focusing on AI safety and tailored content generation, represents another vibrant facet of this emerging ecosystem. Valued at a staggering $5 billion, this American startup has captured the attention of the tech behemoths, securing nearly $400 million from Google — underscoring the intertwined relationship and keen interest of big tech in these startups. A noteworthy product from Anthropic is Claude, an AI chatbot, which, akin to ChatGPT, provides users with detailed, context-relevant responses. Their pedigree, steeped in expertise from former OpenAI members, lends them a unique edge.

Lastly, Midjourney, headquartered in San Francisco, is gaining traction as a generative AI image generator. Although the specifics about their funding remain undisclosed, their remarkable growth trajectory, as evidenced by a 5800% surge in search growth over five years, is hard to overlook. The platform has garnered over 15 million users, all weaving artistic tapestries using its robust features.

In the Grasp of Giants: Big Tech's Hold on Generative AI

Despite being a subset of the broader AI sector, investment in generative AI has surged, reaching a staggering $12 billion within the first five months of 2023 alone. From providing enriched communication channels to fostering unmatched creativity, its essence lies in reshaping and augmenting human experiences.

The verve with which giants like Amazon, Microsoft, and Google are advancing in this domain testifies to its strategic importance. Yet, it's not just about monetary investment or market share. Generative AI's prowess is its influence, be it in shaping investor sentiments, redefining digital landscapes, or altering our very expectations from technology.

However, a pivotal question lingers: Will Big Tech's dominance stifle or stimulate the generative AI sector? While their immense resources can accelerate AI research and applications, the potential for monopolistic control is undeniable.

Notably, the rise of emerging stars in the AI realm, such as Hugging Face and Stability AI, offers a glimmer of hope. Their success stories affirm that innovation, community-driven development, and a clear vision can pave the way for success even amidst giants.

While Big Tech's involvement can catalyze advancements, maintaining a diversified AI ecosystem where startups and innovators can thrive is essential for sustainable growth.