AI — Страница 1018

Determine Which Version of Microsoft Copilot is Right for You

Copilot has quickly become a vital and strategic part of the Microsoft lineup of products and services. Whether you are a large enterprise, small business or merely a regular user of a Windows personal computer, at the very least, you are likely aware of Microsoft Copilot and its generative AI capabilities.

Microsoft is currently offering three versions of Copilot to its customers:

The free Copilot version is available to everyone using Windows, Microsoft Edge or the Bing website.
The Copilot Pro version is also available to everyone using Windows, Microsoft Edge or the Bing website, but requires an additional $20/month subscription fee.
The Copilot 365 version is available to Microsoft 365 subscribers who pay an additional $30/month/user subscription fee.

But which version of Microsoft Copilot is right for you and your business, and what should you know about each version besides the price of access?

Microsoft Copilot version comparison

Feature	Microsoft Copilot	Copilot Pro	Copilot 365
Functional operation	General questions return general answers	General questions return general answers	Specific questions return specific answers
Data source	Bing searches and internet	Bing searches and internet	Organizational data generated internally
Cost	Free	$20/user/month	$30/user/month
Best use casease	General users and small business	SMBs with a need for reliability and speed	SMBs and enterprises with large data pools

How to choose the best version of Microsoft Copilot

Regardless of which version of Microsoft Copilot you decide to use, the software’s basic operation remains the same: Ask a question, get an answer. The differences between each version rest in how the AI accesses its foundational data sources and where that data comes from.

Free Microsoft Copilot

The free version of Microsoft Copilot pulls the data it uses for its natural language processing models from Bing searches and other aggregated information tracked by Microsoft. Basically, the AI’s generative results are based on internal sources, external sources via Bing Search and various custom data sources.

This gives the Copilot platform leeway to answer any general question (Figure A). Often, this makes it less effective when users require a specific and detailed answer to a specific and detailed question. The free version of Microsoft Copilot is intended for general use and should be used in that manner. If general answers to general questions are what you need from generative AI, then Copilot may be all you need.

Figure A: Microsoft Copilot answering a general question. Image: Mark W. Kaelin/TechRepublic

Microsoft Copilot Pro

The latest addition to the company’s generative AI line of software is called Microsoft Copilot Pro. This version of Copilot uses the same NLP models based on Bing searches and other aggregated information as the free version, but for an additional $20/user/month, the pro version grants priority access to GPT-4 and GPT-4 Turbo during peak times for accelerated performance.

In other words, Microsoft Copilot Pro will prioritize access to servers so you can get answers from its AI faster than those customers using the free version only. At least, that is the theory.

The reality is that the time difference between getting an answer from Copilot with the pro version and getting the same answer from the free version is almost negligible at this point. However, assuming AI continues to grow in popularity and use, it is possible getting an answer from Microsoft Copilot could slow down, making priority access much more appealing in the future.

If you plan to rely on Copilot for critical interactions, it might be in your best interest to pay for a subscription that grants priority access. With that said, it may also be financially prudent to wait for that need to arise first. You will have to decide which is best for your situation.

Microsoft Copilot 365

Microsoft Copilot 365 is much different from the free or Pro versions of the platform when it comes to where and how It pulls the data it uses for its NLP model. This is because Copilot 365 uses data gleaned from the data generated by the host organization.

Microsoft Copilot 365 is best used in enterprises, organizations and SMBs that generate large amounts of institutional data. By limiting what data goes into Copilot’s NLP data models, institutional customers can better control what answers the AI will generate. Think of it as easily accessible systematic institutional knowledge. Access to that institutional knowledge will cost $30/user/month, which is in addition to the normal Microsoft 365 subscription.

To illustrate the potential, consider this example. Let’s say you need to include the new official corporate logo for a sales proposal. Typically, you would email the marketing department asking for a link to the logo file; this transaction may require you to wait until the next day for a reply, possibly losing a sale. However, Copilot 365 should already know what the new official logo is and where to get it. By asking Copilot 365, you can find the official link to the new corporate logo in seconds, eliminating the need for an exchange of emails.

The caveat is that the institutional data accessed by Copilot must be accurate, vetted, verifiable and accessible. For Microsoft Copilot 365 to work effectively and efficiently, the institution involved must abandon the traditional concepts of compartmentalized and departmentalized information. For many enterprises and SMBs, this attitude adjustment with regard to data will be a large hurdle to overcome.

Microsoft Copilot 365 has the potential to be an extremely powerful tool for organizations willing to embrace the platform, the technology and the mindset required to implement it. Without that buy-in at all levels of the organization, AI also has the potential to be a terrible waste of energy, resources and time.

If your organization possesses a forward-looking collective mindset, Microsoft Copilot 365 may be exactly what you need to establish a competitive edge in your business marketplace. If implemented correctly, Copilot 365 should provide users with reliable, accurate and specific answers to specific questions. Decision makers at each organization will have to determine if the potential for solid answers is worth the implementation effort and the associated cost.

How do you choose a version of Microsoft Copilot?

The choice of which version of Microsoft Copilot will work best for you or your organization is entirely dependent on you and what you want from generative AI, including if you want nothing at all. At this point in the development cycle, the overall effectiveness of all AI platforms varies greatly. Microsoft Copilot has tremendous potential. All enterprises and SMBs should at least be experimenting with the platform.

Much like cloud computing and the Internet of Things were once technologies on the horizon, generative AI is a technology that is still in its nascent development period. But just like those innovations, AI will play a vital role in future work environments. Therefore, it will be advantageous for organizations, businesses and users — operating at all levels — to become familiar with the capabilities and limitations of Microsoft Copilot and other AI platforms.

Elon Musk’s xAI Unveils Grok-1.5 Vision, Beats OpenAI’s GPT-4V

Elon Musk’s AI startup, xAI has introduced Grok-1.5V, a first-generation multimodal model. In addition to its strong text capabilities, Grok can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs.

Grok-1.5V will be available soon to early testers and existing Grok users.

Grok-1.5V’s notable feature is its ability to understand real-world spatial concepts, surpassing other models in the RealWorldQA benchmark—an important measure of a model’s practical grasp of physical environments.

In a comparative analysis against leading models like GPT-4V, Claude 3 Sonnet, Claude 3 Opus, and Gemini Pro 1.5, Grok-1.5V shows competitive advantages across several benchmarks, highlighting its versatility and strength.

One of Grok-1.5V’s standout features is its ability to translate complex visual information into executable code. For example, when given a flowchart depicting a guessing game, Grok-1.5V easily converts it into Python code, showcasing its practical application in problem-solving scenarios.

Looking forward, the developers of Grok-1.5V anticipate significant improvements in multimodal capabilities across images, audio, and video, signaling a promising path towards building beneficial Artificial General Intelligence (AGI) that comprehensively understands and interacts with the universe.

Grok-1.5V follows the recent introduction of Grok-1.5 by xAI, featuring enhanced reasoning capabilities and a context length of 128,000 tokens. Grok-1.5 boasts notable improvements, particularly in coding and math-related tasks. It beats Mistral Large on various benchmarks including MMLU, GSM8K and HumanEval.

The post Elon Musk’s xAI Unveils Grok-1.5 Vision, Beats OpenAI’s GPT-4V appeared first on Analytics India Magazine.

Can Microsoft Copilot Write Simple SQL Commands for You?

According to the claims made by large technology companies, generative artificial intelligence platforms like Microsoft Copilot will make the working lives of people at all levels of a business organization more productive and efficient. But is that just hyperbole, or do the claims have some merit?

For example, can the free version of Microsoft Copilot, available to all Windows users, write a common SQL query for you on command? That ability would be extremely useful for a time-constrained, less-experienced employee trying to meet a deadline. Below, we explore what happens when we ask Microsoft Copilot for some SQL help.

DOWNLOAD: AI quick glossary from TechRepublic Premium

Testing Microsoft Copilot’s ability to write simple SQL commands

For our experiment, we started with something basic and simple. Using Creative mode, we asked Microsoft Copilot for the code required to create a table in SQL named Employee that included four columns: EMP_ID, Name, Salary, Age. The AI provided this code:
CREATE TABLE Employee (

EMP_ID INT PRIMARY KEY, Name VARCHAR(100), Salary DECIMAL(10, 2), Age INT

);

You can see the prompt and code in Figure A.

Figure A: Asking Microsoft Copilot how to create a simple SQL table. Image: Mark W. Kaelin

While this code would likely require tweaking depending on your specific needs, in general, the code was accurate and useful based on what was asked of Copilot.

Note the explanation provided by Microsoft Copilot as well; it is as important as the code:

“This statement defines EMP_ID as an integer and the primary key for the table, Name as a variable character string with a maximum length of 100 characters, Salary as a decimal with 10 digits in total and 2 digits after the decimal point, and Age as an integer. Adjust the data types and sizes according to your specific requirements.”

Continuing our session with Microsoft Copilot, we then asked the AI to generate the code necessary to insert a new record into this table. The resulting code is below:

INSERT INTO Employee (EMP_ID, Name, Salary, Age) VALUES (1, 'John Doe', 55000.00, 30);

You can see the prompt and code in Figure B.

Figure B: Asking Microsoft Copilot how to insert a record into a SQL table. Image: Mark W. Kaelin

The VALUES section would need to be modified for your specific needs, but the code and the explanation provided by Copilot were accurate and useful. It’s important to note that once our table of employees is complete, we may be tasked with writing queries to pull data from our database.

Next in our session with Microsoft Copilot, we asked the AI to generate the code necessary to list employees from this table with a certain age or greater. In this case, Microsoft Copilot suggested this code.

SELECT * FROM Employee WHERE Age >= 30;

You can see the prompt and code in Figure C.

Figure C: Asking Microsoft Copilot to select records from a SQL table. Image: Mark W. Kaelin

Again, while you may need to adjust the code for your specific needs, this code was accurate and useful for employees less familiar with SQL syntax.

So, when asked the right way, Microsoft Copilot can write basic SQL code for your employees. However, whatever answers are generated by Copilot should always be scrutinized for applicability and accuracy.

DOWNLOAD: AI vs machine learning differences and use cases from TechRepublic Premium

What have we learned about Microsoft Copilot and SQL code generation?

We can draw these conclusions from our experiment with Microsoft Copilot:

Under the right conditions, and when asked the right questions, Copilot can provide useful and accurate SQL commands that employees can apply to their work tasks.
The free version of Microsoft Copilot derives its “knowledge” by modeling data from Bing searches and the internet. Tutorials explaining basic SQL commands and offering SQL tips are common on the internet. Those two conditions mean asking Copilot about SQL commands returns is more likely to return useful answers. However, asking about topics not prevalent or not explored on the internet may not be as successful.
Relying on data generated from the internet for business decisions can be a risky activity. The internet is infamous for inaccurate and misleading information, and answers provided by Microsoft Copilot, in certain situations, could be tainted by these inaccuracies.
Even if the answers provided by Microsoft Copilot are useful, they still must always be vetted and filtered by employees for accuracy and applicability. While generative AI can be a powerful tool, it should seldom be trusted with making final decisions or acting on its own.
One of the most powerful aspects of Microsoft Copilot sessions is the AI’s ability to iterate answers. In our example, we were able to use Copilot’s previous answers to our questions as a foundational basis for our next questions. This capability allows employees to have a conversation with the AI and then build toward the most accurate and useful answer.
Our example also shows that the more detailed the question submitted to Microsoft Copilot, the more accurate the generated answer. By including variable names, Copilot was able to provide complete answers and not just general SQL command tips. The ability to formulate detailed questions is what separates a simple search from a useful Copilot session.
Using Microsoft Copilot requires an adjustment in thinking by employees seeking to use it as a work tool. Copilot is not just another search engine, and it should not be approached that way. Essentially, employees are asking Copilot to read and interpret information available on the internet for them and then present useful, accurate and viable answers to their questions. The questions employees ask of Copilot will be fundamentally different from the questions they ask of a basic search engine.

Should employees trust Microsoft Copilot for work tasks?

We have proven that Microsoft Copilot can be a useful productivity tool for your employees, but only if it is used correctly. Employees must realize that Copilot, and any other generative AI platform, is not just another search engine. Questions submitted to Copilot must be thought out, detailed and specific. The more detailed the question, the more detailed the answer. Employees must also realize that the first question is often just the foundation that leads to a more useful and enlightening conversation with Copilot.

Ola Krutrim Makes History with In-House Cloud Infrastructure, Skips AWS and Azure

Bhavish Aggarwal, the Ola chief, has announced that Krutrim has a major breakthrough and is running on its own cloud infrastructure.

“The Krutrim Team is making some major improvements to the model and also the infra,” said Aggarwal in a post on X, and that Krutrim is running on its cloud infrastructure, and not any cloud provider like AWS or Azure.

.@Krutrim is now crazy fast!
The @Krutrim Team is making some major improvements to the model and also the infra.
Also, कृत्रिम is running on our own cloud infra. Not AWS or Azure. pic.twitter.com/G6XctkQcex

— Bhavish Aggarwal (@bhash) April 12, 2024

Recently, Intel also announced that Ola Krutrim is utilising Intel Gaudi 2 clusters to pre-train and fine-tune its foundational models with generative capabilities in ten languages, achieving industry-leading price/performance ratios compared to existing market solutions.

Additionally, Krutrim is currently pre-training a larger foundational model on an Intel Gaudi 2 cluster, further advancing its AI capabilities.

A few days ago, Krutrim announced its partnership with Databricks to improve its foundational language model, particularly for Indian languages, aiming to enhance AI solutions in India.

“The Krutrim model was launched using our platform,” said Naveen Rao, VP of generative AI at Databricks, during an exclusive interview with AIM.

Ola Krutrim has been quite obsessed with developing its own foundational model from scratch, despite rumours that it is being built on fine-tuned models such as Llama-2, Mistral, Claude-3 or even the most recent, DBRX.

In December last year, Ola’s chief Aggarwal unveiled Krutrim (which means artificial in Sanskrit). This has also been touted as “India’s first full-stack AI” solution. At first glance, the platform has a stark resemblance to ChatGPT — at least the UI/UX bit of the platform — but only greenish.

Aggarwal claimed that Krutrim AI is better than GPT-4 in various Indic languages. He said it is trained on 2 trillion tokens and can understand over 20 Indian languages and generate content in about 10 languages, including Marathi, Hindi, Bengali, Tamil, Kannada, Telugu, Odia, Gujarati, and Malayalam.

The post Ola Krutrim Makes History with In-House Cloud Infrastructure, Skips AWS and Azure appeared first on Analytics India Magazine.

Ola Krutrim Creates History, Runs On Its Own Cloud Infrastructure, Not AWS or Azure

Bhavish Aggarwal, the Ola chief, has announced that Krutrim has a major breakthrough and is running on its own cloud infrastructure.

.@Krutrim is now crazy fast!
The @Krutrim Team is making some major improvements to the model and also the infra.
Also, कृत्रिम is running on our own cloud infra. Not AWS or Azure. pic.twitter.com/G6XctkQcex

— Bhavish Aggarwal (@bhash) April 12, 2024

Additionally, Krutrim is currently pre-training a larger foundational model on an Intel Gaudi 2 cluster, further advancing its AI capabilities.

A few days ago, Krutrim announced its partnership with Databricks to improve its foundational language model, particularly for Indian languages, aiming to enhance AI solutions in India.

“The Krutrim model was launched using our platform,” said Naveen Rao, VP of generative AI at Databricks, during an exclusive interview with AIM.

The post Ola Krutrim Creates History, Runs On Its Own Cloud Infrastructure, Not AWS or Azure appeared first on Analytics India Magazine.

Cohere Releases Rerank 3, Integrates Enterprise Search and RAG Capabilities

Cohere has unveiled Rerank 3, a new foundation model built to enhance enterprise search and Retrieval Augmented Generation (RAG) systems.

This new model from Cohere will change how businesses handle and access large amounts of data, improving search efficiency and accuracy. Rerank 3 integrates smoothly with any database, search index, and legacy applications with native search capabilities.

Its remarkable feature set includes a 4k context length, enabling superior search quality for longer documents, and the ability to search across multi-aspect and semi-structured data such as emails, invoices, JSON documents, code, and tables.

Moreover, with support for over 100 languages, Rerank 3 ensures robust multilingual performance, simplifying retrieval for diverse user bases.

One of the key highlights of Rerank 3 is its exceptional precision in semantic reranking, ensuring that only the most relevant information is delivered to generative models. This optimised approach not only enhances response accuracy but also keeps latency and costs at minimal levels, particularly beneficial when dealing with extensive data repositories.

Enterprise data is often complex and current systems have difficulty searching through multi-aspect and semi-structured data sources. Rerank 3 excels in ranking multi-aspect data, leveraging metadata fields to deliver comprehensive and accurate search results.

Its capabilities extend to code retrieval, facilitating productivity gains for engineering teams working with proprietary code repositories. Furthermore, Rerank 3 stands out with up to 3x improvements in latency at long context lengths compared to its predecessor.

Rerank 3 is supported natively in Elastic’s Inference API. This collaboration with Elastic signifies a strategic step towards empowering businesses with advanced search capabilities tailored to their complex data needs.

“We’re excited to be partnering with Cohere to help businesses unlock the potential of their data.” said Matt Riley, GVP & GM of Elasticsearch.

The post Cohere Releases Rerank 3, Integrates Enterprise Search and RAG Capabilities appeared first on Analytics India Magazine.

Data Centres’ Lean Towards Nuclear-Powered Future to Combat Energy Needs

According to recent industry reports, the global electricity consumption of data centres is projected to reach a staggering 848 terawatt-hours (TWh) by 2030, nearly doubling from the estimated 460 TWh consumed in 2022.

To put these figures into perspective, India, the world’s second-most populous country, consumed 1,443 TWh of electricity in 2021.

The projected 2030 data centre consumption would equal more than half of India’s electricity usage. Similarly, Ireland generated 34.5 TWh of electricity in 2022.

So, as data centre operators confront energy challenges like scarcity of available energy for powering data centres from the grid and an imperative to minimise carbon emissions—they’re now moving towards nuclear energy!

Equinix, the world’s largest data centre colocation provider, recently signed a pre-agreement with Oklo, a firm which produces small modular reactors (SMR), to purchase nuclear energy for a $25 million prepayment.

A Small Modular Reactor (SMR) is generally defined as an advanced reactor that produces up to 300 MW(e) per module.

As the energy demands of data centres, which power every critical digital infrastructure and technologies like generative AI, 5G, Io, etc., these reactors or SMRs are a critical move towards clean and sustainable data centres powered by safe nuclear energy.

“A normal data centre needs 32 megawatts of power flowing into the building. For an AI data centre, it’s 80 megawatts,” says Chris Sharp, CTO at Digital Realty, a US data centre giant.

Oklo’s reactors powered by nuclear fission energy stand as a viable option for data centres as they can generate 15MW each and can function for a minimum of 10 years before requiring refuelling.

Equinix intends to purchase power from Oklo’s upcoming SMR installations to fuel its US data centres. It will possess the first option for 36 months to acquire between 100MW and 500MW of cumulative capacity from specific Oklo powerhouses.

Additionally, smaller microreactors, with capacities ranging from 1 to 20 MW, are being developed specifically to power data centres and industrial sites. Startups like Oklo aim to deploy factory-built microreactors by 2028 to meet these energy demands.

Besides Oklo, several other US-based firms, including NuScale Power, Kairos Power, and X-energy, are actively developing small modular reactors (SMRs). Additionally, UK-based Rolls-Royce is also pursuing SMR technology.

Challenges in Adoption

While public opinion is nearly evenly split between support and opposition, and safety remains a primary concern—many SMR manufacturers are trying to combat the issue.

For instance, NuScale Power, a leading developer of SMRs, has made safety a top priority in its design, according to CEO and co-founder José Reyes.

Reyes, who spent nearly a decade as a research engineer in the Reactor Safety Division of the U.S. Nuclear Regulatory Commission, emphasises that NuScale’s SMRs are designed to safely shut down without operator intervention in the event of a worst-case scenario.

To address the concerns arising from nuclear disasters in Fukushima, Chornobyl, and Three Mile Island, SMRs’ smaller size, simpler design, and inherent safety features make them more reliant and risk-averse.

SMRs also generate less nuclear waste, as they require refuelling every 3-7 years or even up to 40 years, compared to 1-2 years for conventional nuclear power plants.

Additionally, SMRs can be installed closer to energy consumers and data centres, overcoming grid constraints.

Jay Dietrich from the Uptime Institute has also highlighted that SMRs provide reliable, carbon-free electricity, which can complement intermittent renewable energy sources like wind and solar power to help data centres become more sustainable.

Despite the potential benefits, the SMR market faces several challenges. However, due to hurdles such as manufacturing ramp-up, design permits, site approvals, and grid connection permissions, SMR-powered data centres may become a reality by the late 2020s or early 2030s.

The emerging SMR market recently faced a setback when NuScale’s plan to launch a six-reactor, 462 MW project with Utah Associated Municipal Power Systems collapsed in early November.

Several towns withdrew from the project after costs rose, highlighting the challenges faced by the nascent industry.

Initial construction costs for SMRs may also be high, but economies of scale are expected to lower costs over time.

Hans Lohse, a representative from Idaho National Laboratory (INL), believes that economies of scale will play a significant role in reducing SMR costs.

“I don’t think anyone expects the first couple of builds to be the cheapest, but when you get the supply chain going, the cost curve will go down, and you will get economies of scale,” Lohse said.

Another concern surrounding SMRs is nuclear waste production, albeit in smaller quantities compared to traditional large-scale nuclear plants.

While China’s Linglong One reactor became the first small reactor to receive safety approval from the IAEA, the success of SMRs will depend on continued public engagement, regulatory support, and a demonstrated track record of safe operation in rural and urban locations.

Big Tech Also Turns to SMR

Tech giants like Microsoft and Amazon have also shown significant interest in SMRs, nuclear technology and nuclear power purchase agreements to source a portion of their data centres’ electricity needs from nuclear plants for growing energy needs by data centres.

Microsoft is exploring nuclear power deals with companies like Helion and Ontario Power Generation for its operations in Canada. The tech giant has reiterated its seriousness in the matter by employing a director of nuclear development acceleration and a director of nuclear technologies to spearhead its exploration.

Concurrently, Amazon has already acquired a massive 960 MW data centre campus in Pennsylvania, fueled by the Susquehanna nuclear power plant, showcasing the growing interest of hyperscalers in nuclear energy for data centre operations.

While not using SMRs directly, this demonstrates the interest of hyperscalers in nuclear energy for powering data centres.

The first SMR reactors will likely be installed on existing nuclear sites, where infrastructure and permitting are already in place. Rolls-Royce is starting with a decommissioned nuclear plant in Trawsfynydd, Wales, which could hold two 470MW systems.

In Canada, Ontario Power Generation (OPG) is building up to four new SMRs in Darlington, Ontario, alongside its existing CANDU reactors. OPG has signed a Power Purchase Agreement with Microsoft, which may include nuclear power from these SMRs if they are operational in time.

The post Data Centres’ Lean Towards Nuclear-Powered Future to Combat Energy Needs appeared first on Analytics India Magazine.

Revolutionizing AI with Apple’s ReALM: The Future of Intelligent Assistants

In the ever-evolving landscape of artificial intelligence, Apple has been quietly pioneering a groundbreaking approach that could redefine how we interact with our Iphones. ReALM, or Reference Resolution as Language Modeling, is a AI model that promises to bring a new level of contextual awareness and seamless assistance.

As the tech world buzzes with excitement over OpenAI's GPT-4 and other large language models (LLMs), Apple's ReALM represents a shift in thinking – a move away from relying solely on cloud-based AI to a more personalized, on-device approach. The goal? To create an intelligent assistant that truly understands you, your world, and the intricate tapestry of your daily digital interactions.

At the heart of ReALM lies the ability to resolve references – those ambiguous pronouns like “it,” “they,” or “that” that humans navigate with ease thanks to contextual cues. For AI assistants, however, this has long been a stumbling block, leading to frustrating misunderstandings and a disjointed user experience.

Imagine a scenario where you ask Siri to “find me a healthy recipe based on what's in my fridge, but hold the mushrooms – I hate those.” With ReALM, your iPhone would not only understand the references to on-screen information (the contents of your fridge) but also remember your personal preferences (dislike of mushrooms) and the broader context of finding a recipe tailored to those parameters.

This level of contextual awareness is a quantum leap from the keyword-matching approach of most current AI assistants. By training LLMs to seamlessly resolve references across three key domains – conversational, on-screen, and background – ReALM aims to create a truly intelligent digital companion that feels less like a robotic voice assistant and more like an extension of your own thought processes.

The Conversational Domain: Remembering What Came Before

Conversational AI, ReALM tackles a long-standing challenge: maintaining coherence and memory across multiple turns of dialogue. With its ability to resolve references within an ongoing conversation, ReALM could finally deliver on the promise of a natural, back-and-forth interaction with your digital assistant.

Imagine asking Siri to “remind me to book tickets for my vacation when I get paid on Friday.” With ReALM, Siri would not only understand the context of your vacation plans (potentially gleaned from a previous conversation or on-screen information) but also have the awareness to connect “getting paid” to your regular payday routine.

This level of conversational intelligence feels like a true leap forward, enabling seamless multi-turn dialogues without the frustration of constantly re-explaining context or repeating yourself.

The On-Screen Domain: Giving Your Assistant Eyes

Perhaps the most groundbreaking aspect of ReALM, however, lies in its ability to resolve references to on-screen entities – a crucial step towards creating a truly hands-free, voice-driven user experience.

Apple's research paper delves into a novel technique for encoding visual information from your device's screen into a format that LLMs can process. By essentially reconstructing the layout of your screen in a text-based representation, ReALM can “see” and understand the spatial relationships between various on-screen elements.

Consider a scenario where you're looking at a list of restaurants and ask Siri for “directions to the one on Main Street.” With ReALM, your iPhone would not only comprehend the reference to a specific location but also tie it to the relevant on-screen entity – the restaurant listing matching that description.

This level of visual understanding opens up a world of possibilities, from seamlessly acting on references within apps and websites to integrating with future AR interfaces and even perceiving and responding to real-world objects and environments through your device's camera.

The research paper on Apple's ReALM model delves into the intricate details of how the system encodes on-screen entities and resolves references across various contexts. Here's a simplified explanation of the algorithms and examples provided in the paper:

Encoding On-Screen Entities: The paper explores several strategies to encode on-screen elements in a textual format that can be processed by a Large Language Model (LLM). One approach involves clustering surrounding objects based on their spatial proximity and generating prompts that include these clustered objects. However, this method can lead to excessively long prompts as the number of entities increases.

The final approach adopted by the researchers is to parse the screen in a top-to-bottom, left-to-right order, representing the layout in a textual format. This is achieved through Algorithm 2, which sorts the on-screen objects based on their center coordinates, determines vertical levels by grouping objects within a certain margin, and constructs the on-screen parse by concatenating these levels with tabs separating objects on the same line.

By injecting the relevant entities (phone numbers in this case) into the textual representation, the LLM can understand the on-screen context and resolve references accordingly.

Examples of Reference Resolution: The paper provides several examples to illustrate the capabilities of the ReALM model in resolving references across different contexts:

a. Conversational References: For a request like “Siri, find me a healthy recipe based on what's in my fridge, but hold the mushrooms – I hate those,” ReALM can understand the on-screen context (contents of the fridge), the conversational context (finding a recipe), and the user's preferences (dislike of mushrooms).

b. Background References: In the example “Siri, play that song that was playing at the supermarket earlier,” ReALM can potentially capture and identify ambient audio snippets to resolve the reference to the specific song.

c. On-Screen References: For a request like “Siri, remind me to book tickets for the vacation when I get my salary on Friday,” ReALM can combine information from the user's routines (payday), on-screen conversations or websites (vacation plans), and the calendar to understand and act on the request.

These examples demonstrate ReALM's ability to resolve references across conversational, on-screen, and background contexts, enabling a more natural and seamless interaction with intelligent assistants.

The Background Domain

Moving beyond just conversational and on-screen contexts, ReALM also explores the ability to resolve references to background entities – those peripheral events and processes that often go unnoticed by our current AI assistants.

Imagine a scenario where you ask Siri to “play that song that was playing at the supermarket earlier.” With ReALM, your iPhone could potentially capture and identify ambient audio snippets, allowing Siri to seamlessly pull up and play the track you had in mind.

This level of background awareness feels like the first step towards truly ubiquitous, context-aware AI assistance – a digital companion that not only understands your words but also the rich tapestry of your daily experiences.

The Promise of On-Device AI: Privacy and Personalization

While ReALM's capabilities are undoubtedly impressive, perhaps its most significant advantage lies in Apple's long-standing commitment to on-device AI and user privacy.

Unlike cloud-based AI models that rely on sending user data to remote servers for processing, ReALM is designed to operate entirely on your iPhone or other Apple devices. This not only addresses concerns around data privacy but also opens up new possibilities for AI assistance that truly understands and adapts to you as an individual.

By learning directly from your on-device data – your conversations, app usage patterns, and even ambient sensory inputs – ReALM could potentially create a hyper-personalized digital assistant tailored to your unique needs, preferences, and daily routines.

This level of personalization feels like a paradigm shift from the one-size-fits-all approach of current AI assistants, which often struggle to adapt to individual users' idiosyncrasies and contexts.

ReALM-250M model achieves impressive results:

- Conversational Understanding: 97.8
- Synthetic Task Comprehension: 99.8
- On-Screen Task Performance: 90.6
- Unseen Domain Handling: 97.2

The Ethical Considerations

Of course, with such a high degree of personalization and contextual awareness comes a host of ethical considerations around privacy, transparency, and the potential for AI systems to influence or even manipulate user behavior.

As ReALM gains a deeper understanding of our daily lives – from our eating habits and media consumption patterns to our social interactions and personal preferences – there is a risk of this technology being used in ways that violate user trust or cross ethical boundaries.

Apple's researchers are keenly aware of this tension, acknowledging in their paper the need to strike a careful balance between delivering a truly helpful, personalized AI experience and respecting user privacy and agency.

This challenge is not unique to Apple or ReALM, of course – it is a conversation that the entire tech industry must grapple with as AI systems become increasingly sophisticated and integrated into our daily lives.

Towards a Smarter, More Natural AI Experience

As Apple continues to push the boundaries of on-device AI with models like ReALM, the tantalizing promise of a truly intelligent, context-aware digital assistant feels closer than ever before.

Imagine a world where Siri (or whatever this AI assistant may be called in the future) feels less like a disembodied voice from the cloud and more like an extension of your own thought processes – a partner that not only understands your words but also the rich tapestry of your digital life, your daily routines, and your unique preferences and contexts.

From seamlessly acting on references within apps and websites to anticipating your needs based on your location, activity, and ambient sensory inputs, ReALM represents a significant step towards a more natural, seamless AI experience that blurs the lines between our digital and physical worlds.

Of course, realizing this vision will require more than just technical innovation – it will also necessitate a thoughtful, ethical approach to AI development that prioritizes user privacy, transparency, and agency.

As Apple continues to refine and expand upon ReALM's capabilities, the tech world will undoubtedly be watching with bated breath, eager to see how this groundbreaking AI model shapes the future of intelligent assistants and ushers in a new era of truly personalized, context-aware computing.

Whether ReALM lives up to its promise of outperforming even the mighty GPT-4 remains to be seen. But one thing is certain: the age of AI assistants that truly understand us – our words, our worlds, and the rich tapestry of our daily lives – is well underway, and Apple's latest innovation may very well be at the forefront of this revolution.

Eventbrite’s Data Privacy Manager, Aditi Sharma on ‘Encrypt Data, not Empathy’

At India’s biggest DE&I summit, the Rising 2024, Aditi Sharma, engineering manager at Eventbrite, takes us on a journey to navigate from coding to leading while protecting the data of millions worldwide.

“In our journey, we aspire to be the growth engine for creators, earning the trust of consumers and serving as an antidote for isolation. As we nurture creativity, let us equally prioritise safeguarding the data privacy of our customers,” said Sharma.

With the true magic of adaptability and user-friendliness, Eventbrite stands as the premier destination for discovering live events. Powered by cutting-edge technology, our platform seamlessly connects individuals, nurturing meaningful social connections and endeavouring to unite the world through the power of live experiences.

The Need for DE&I in Data and Analytics

“Thanks to the bold conversations surrounding DEI today, I stand before you with pride, having navigated these cultural hurdles. It’s a privilege to represent my hometown of Bharatpur in Rajasthan, while also being honoured to serve at Eventbrite, a global tech company, as a woman specialising in data privacy.

Let our diversity fuel innovation, and our unity foster progress,” she added at the Rising 2024 keynote, emphasising the role of DE&I in reducing data biases and unleashing AI innovations.

Expanding on her point, she emphasised the pivotal role of ethnicity in cultivating diversity, equity, and inclusion (DE&I), underscoring how adherence to these values informs core principles like collaboration, mutual respect, and appreciation for individual strengths—vital components for achieving success.

These ingrained values, instilled from childhood, have fueled Sharma, a decade of invaluable expertise in IT. Prior to Eventbrite, she led teams at Groupon, Deloitte, EF Education First and others. As an engineering manager at Eventbrite, currently leads data privacy initiatives, protecting customers’ data privacy, keeping in mind regulatory frameworks such as GDPR, CCPA, and more.

The Need for Empathy

Empathy is one powerful aspect of unleashing organisational success. “When teams feel safe to admit imperfections and open themselves to feedback and criticism, it creates a culture where growth thrives, leading to organisational success,” said Sharma.

She said leaders who embrace vulnerability and empathy cultivate stronger trust among their employees, fostering an environment conducive to open communication and innovation. This, in turn, encourages “learning and growth.”

Sharma also said that the key lies in actively and openly listening, with the intent of solving problems rather than merely hearing. Leaders should establish forums for encouraging open communication with employees. This includes anonymous surveys, regular one-on-one check-ins, or simply rotating the chair to invite new perspectives and opinions.

Keeping this spirit intact, Aditi highlighted an empowering initiative, the Sisterhood Program, by the WIT committee at Eventbrite. The aim is to build a vibrant community for women within the organisation.

She believes through this program, women are connected to one another, sharing experiences and forming bonds that combat isolation and establish a dependable support network.

“To lead through the complexities of the ever-evolving world of technology, having just technical excellence is not enough for project success; it requires a profound depth of emotional intelligence to navigate,” concluded Sharma.

The post Eventbrite’s Data Privacy Manager, Aditi Sharma on ‘Encrypt Data, not Empathy’ appeared first on Analytics India Magazine.

7 Steps to Mastering Data Engineering

Image by Author

Data engineering refers to the process of creating and maintaining structures and systems that collect, store, and transform data into a format that can be easily analyzed and used by data scientists, analysts, and business stakeholders. This roadmap will guide you in mastering various concepts and tools, enabling you to effectively build and execute different types of data pipelines.

1. Containerization and Infrastructure as Code

Containerization allows developers to package their applications and dependencies into lightweight, portable containers that can run consistently across different environments. Infrastructure as Code, on the other hand, is the practice of managing and provisioning infrastructure through code, enabling developers to define, version, and automate cloud infrastructure.

In the first step, you will be introduced to the fundamentals of SQL syntax, Docker containers, and the Postgres database. You will learn how to initiate a database server using Docker locally, as well as how to create a data pipeline in Docker. Furthermore, you will develop an understanding of Google Cloud Provider (GCP) and Terraform. Terraform will be particularly useful for you in deploying your tools, databases, and frameworks on the cloud.

2. Workflow Orchestration

Workflow orchestration manages and automates the flow of data through various processing stages, such as data ingestion, cleaning, transformation, and analysis. It is a more efficient, reliable, and scalable way of doing things.

In thes second step, you will learn about data orchestration tools like Airflow, Mage, or Prefect. They all are open source and come with multiple essential features for observing, managing, deploying, and executing data pipeline. You will learn to set up Prefect using Docker and build an ETL pipeline using Postgres, Google Cloud Storage (GCS), and BigQuery APIs .

Check out the 5 Airflow Alternatives for Data Orchestration and choose the one that works better for you.

3. Data Warehousing

Data warehousing is the process of collecting, storing, and managing large amounts of data from various sources in a centralized repository, making it easier to analyze and extract valuable insights.

In the third step, you will learn everything about either Postgres (local) or BigQuery (cloud) data warehouse. You will learn about the concepts of partitioning and clustering, and dive into BigQuery's best practices. BigQuery also provides machine learning integration where you can train models on large data, hyperparameter tuning, feature preprocessing, and model deployment. It is like SQL for machine learning.

4. Analytical Engineer

Analytics Engineering is a specialized discipline that focuses on the design, development, and maintenance of data models and analytical pipelines for business intelligence and data science teams.

In the fourth step, you will learn how to build an analytical pipeline using dbt (Data Build Tool) with an existing data warehouse, such as BigQuery or PostgreSQL. You will gain an understanding of key concepts such as ETL vs ELT, as well as data modeling. You will also learn advanced dbt features such as incremental models, tags, hooks, and snapshots.

In the end, you will learn to use visualization tools like Google Data Studio and Metabase for creating interactive dashboards and data analytic reports.

5. Batch Processing

Batch processing is a data engineering technique that involves processing large volumes of data in batches (every minute, hour, or even days), rather than processing data in real-time or near real-time.

In the fifth step of your learning journey, you will be introduced to batch processing with Apache Spark. You will learn how to install it on various operating systems, work with Spark SQL and DataFrames, prepare data, perform SQL operations, and gain an understanding of Spark internals. Towards the end of this step, you will also learn how to start Spark instances in the cloud and integrate it with the data warehouse BigQuery.

6. Streaming

Streaming refers to the collecting, processing, and analysis of data in real-time or near real-time. Unlike traditional batch processing, where data is collected and processed at regular intervals, streaming data processing allows for continuous analysis of the most up-to-date information.

In the sixth step, you will learn about data streaming with Apache Kafka. Start with the basics and then dive into integration with Confluent Cloud and practical applications that involve producers and consumers. Additionally, you will need to learn about stream joins, testing, windowing, and the use of Kafka ksqldb & Connect.

If you wish to explore different tools for various data engineering processes, you can refer to 14 Essential Data Engineering Tools to Use in 2024.

7. Project: Build an end-to-end Data Pipeline

In the final step, you will use all the concepts and tools you have learned in the previous steps to create a comprehensive end-to-end data engineering project. This will involve building a pipeline for processing the data, storing the data in a data lake, creating a pipeline for transferring the processed data from the data lake to a data warehouse, transforming the data in the data warehouse, and preparing it for the dashboard. Finally, you will build a dashboard that visually presents the data.

Final Thoughts

All the steps mentioned in this guide can be found in the Data Engineering ZoomCamp. This ZoomCamp consists of multiple modules, each containing tutorials, videos, questions, and projects to help you learn and build data pipelines.

In this data engineering roadmap, we have learned the various steps required to learn, build, and execute data pipelines for processing, analysis, and modeling of data. We have also learned about both cloud applications and tools as well as local tools. You can choose to build everything locally or use the cloud for ease of use. I would recommend using the cloud as most companies prefer it and want you to gain experience in cloud platforms such as GCP.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in technology management and a bachelor's degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

Рубрика: AI