Understand the ACID and BASE in modern data engineering

Glass science

Introduction

Dear Data Engineers, this article is a very interesting topic. Let me give some flashback; a few years ago, someone in the discussion coined the new word how ACID and BASE properties of DATA. Suddenly drop silence in the room. Everyone started staring at each other faces, few of them started saying H2SO4, HCL, HNO3, and H2CO3 are ACID KOH, and NaOH is BASE.

The person who threw the word, he got, stood up and said, Guys! Kindly listen to me. I know you all are A+ in your Engineering Chemistry or Chemical Engineering or Whatever Chemistry you learned during your schools and college. But I am talking about Data Engineering. But the one that I mentioned is key properties of the transaction, specifically from an operational perspective, yes! This is essential for OLTP and OLAP for current digital transformation and applicable for all industries to implement the best Operational systems and build Morden Data Warehouses. He started articulating all the ingredients very detail as follows, Let’s focus on.

ACID and BASE

What is a Morden Database (DB)?

We know that databases are well-structured and organized collections of data stored on DB servers. The main focus on this is to store, manage and handle that data and processes the same for analytics intention to derive the necessary insights from it and build the various business solutions and make use of it to enhance the business opportunities. These are so-called modern database systems that would be managed specifically on the cloud systems. Those systems have been designed to handle them precisely in multiple cloud environments like Azure, AWS, and GCP.

Why are ACID and BASE Important in This Modern Database World?

No worries, where ACID and BASE play here in this context, both are guiding stars leading organizations to the successful database management approach.

All, good! What is the problem with the existing DB management approach, and where are all these coming into the stage now? There are several reasons, guy! In this current data world, one of the major challenges with data, though, is the generating massive amounts of data that is to be processed in seconds, minutes, hours, and daily basis, I hope you all agree with me. So, we started calling this DATA as BIG DATA. What is the scope of it? Certainly, I can’t say it in one word or one line because there are many more.

What are the benefits of ACID and BASE?

To get the most benefits from this, first, we have to enhance the capabilities and standards of the data during each action on it, it would be while inserting, updating, selecting and analyzing, and implementing DATA products with GOLDEN datasets, So the best technique in this data or data warehouse domain to create steering the convolutions of data management is through the use of various database sources.

To achieve this, ACID and BASE are a set of guiding standards that are used to guarantee that database transactions are processed consistently.

My quick glimpse for standards is that whenever the changes are made within a database that needs to be performed by nursing them and ensuring the data within doesn’t become tainted. By the way, we are applying the ACID properties to each transaction -modification of a rows in table/database is the best way to maintain the truth and consistency of a database.

  • Data integrity.
  • Simplified operational functions
  • Reliable and durable storage.

What is ACID?

ACID refers to the four major properties that define Atomicity, Consistency, Isolation, and Durability.

ACID transaction If your database operations have all these ACID properties, we can call an ACID transaction, and the data storage that applies this set of operations is called an ACID transaction system.

This guaranteed data integrity regardless of System and Power failures and errors or other issues with respect to data and its transaction activities, such as creating a new record(s) or updating data the row(s).

In simple terms, ACID provides guideline principles that safeguard database transactions that are processed the same consistently.

ACID

Let’s focus on each property in detail now.

Atomicity: in just one word, I could say “Completed” or “Not at All” with respect to my transaction. Further simplified “DONE” or “Don’t Disturb.” Still confused, yes, I can understand. During the database transaction(s) we have to ensure that your commit statement make finishes the entire transaction operation successfully. If any other cases like DB connection issues, internet outages, power outages, data constraints missing, or quality of data interludes in the middle of an operation(s), the database should roll back into its prior state of safe position and hold the right data by commit statement being initiated.

By using atomicity, we can ensure that either the entire transaction was completed or that nothing of it was done.

database management

Consistency: As we know, always the expectation from anything is consistency, regardless of the database as well; it means maintaining data integrity constraints across the journey to sustain the quality and performance perspective. This process stage will be abandoned, and changes will be rolled back to their previous state to retain consistency.

Further to explain, the same consistency in the transaction should not violate integrity constraints that were placed on the table and database level rules for data. By enforcing consistency and ensuring the overall database retains integrity and performance. If you want to implement more rigid declarative constraints can be placed on the database or table level for each transaction. The objective is the prepare the golden dataset for our analytics and advance analytics since all these stages can’t be taken care of at data ingestion, transformation, and service layers in the data pipeline; on top of we have a lot more action items on data that will be taken care there. Data lineage will help the monitor those to understand the CDC of data; remember this in our mind.

Isolation

Each transaction is performed in serializable mode and in distinct order without impacting any other transactions happening in parallel. In focused ways, multiple transactions can occur parallelly, and each transaction has no possibility of impacting each other transactions occurring at the same time. We could accomplish between two ends, which would be optimistic and pessimistic transactions scope.

• An optimistic transaction will ensure no duplicate reading or writing in the same place twice or more. This approach transactions will be terminated in the case of duplicate transactions.

• A pessimistic transaction will restrict the transactions in the assumption of impacting other ones when any reads or writes, which is a very safe mode of operations with a minimal transaction, could be terminated.

Durability

As we know that, durability ensures stability and sustainability; in the same fashion, even in any system failure, as we discussed earlier in the database(s) that the changes are successfully committed and will survive constantly and make sure that the data is NOT corrupted at any cost.

How are ACID Transactions Implemented?

Steps

  1. Identify the Location of the record that needs to be updated from the Table/DB server.
  2. Get ready with buffer memory for transferring the block disk into the memory space.
  3. Start your updates in that memory space.
  4. Start pushing the modified block back out to disk.
  5. Log the entry for reference.
  6. Lock the transaction(s) until a transaction completes or fails.
  7. Make sure the transactions are stored in the transaction logs table/files
  8. Data is saved in a separate repository, then callout as ACID, implemented in the actual database.
  9. If any case of system failure in the mid-transaction, the transaction should either roll back or continue from where the transaction log left off.
  10. All done in the best ways! The ACID is in place.
transaction

Now, Mr. Someone turned into the audience in the meeting room and said, hope you all understood ACID in Data Engineering and Morden DB systems; all started nodding their heads. Then Mr. started briefing the BASE in Data Engineering and Morden DB systems.

He has highlighted how in chemistry, a BASE is the opposite of ACID; even in Database concepts as well we have a similar relationship again; the BASE concepts used to provide numerous benefits over ACID, and the prior one is more focused on Data Availability of database systems, and BASE relates to ACID indirectly.

We could derive the words behind BA S E.

Basically Available – Availability is the key factor in the current digital world; in the BASE context, databases will guarantee the availability of required data by replicating it across the different geography and rather than enforcing immediate consistency on the database, in the cloud (Azure) technology this is mandatory action item while implementing any data components and this comes along with simple and powerful configuration process.

Soft State – Do not check any business rules; stay written consistently.

Eventually Consistent – In the BASE context, there won’t be a guarantee of enforcement and consistency, but this makes simplicity in the database make sure that it always gets the last refreshed data.

What we discussed about the BASE complaints is that the databases face disadvantages with respect to consistency. Even though the DB Developers have more liberty to employ data storage solutions in simplified ways and work faster. But another way around, we’re missing all aspects of what we discussed in ACID.

In this Morden database engineering culture, there many options to bring BASE implies Databases than ACID specific, here the few examples are NoSQL databases, these types are more be inclined toward BASE principles, my favorites are MongoDB, Cosmos DB, and Cassandra, but some NoSQL databases are also related and apply partially to ACID rules, which is required for functions facets. Which can be useful for the Data Warehouses and Data Lake in the staging layer.

Mr. Someone has completed his big round of journey on ACID and BASE. Finally, the folks in the meeting room asked whether we have Ph values in the Database and any specific factors to improve and neutralize those. He replied Yes! We will discuss this in the next meeting and close the meeting.

Conclusion

Guys! I hope you understood, and I believe below are the takeaway from this article.

  • What are a Morden Database (DB) and its features?
  • What and ACID and BASE and why are both important in this Morden Database world to survey
  • Advantageous over the implementation of ACID in Database
  • A very detailed study about ACID and how to implement the same with simple steps
  • How BASE is more flexible than the ACID and available database in the market.

Even though we discussed the advantages, there should be pitfalls will always be there in any context. Let’s quickly,

ACID transaction’s pitfalls

  • Since we’re using the locking mechanism, ACID transactions tend to be sluggish with the Read and Write operations. So high-volume applications might hit the performance.
  • So the choice is yours, based on strong consistency or availability, slower with ACID-compliant DBs or No ACID-compliant but faster.
  • Remember, Data consistency, Data Quality, and availability aspects are major interesting for decision-making and prediction.

Thanks a lot for your time, and I will get back with another interesting topic shortly! Till then, bye! – Shantha

DSC Weekly 15 August 2023

Announcements

  • Governance, Risk and Compliance (GRC) programs empower organizations of all industries and sizes to better manage crucial activities within the company – boosting the effectiveness of people, business processes, technology, and other vital business elements. At the upcoming Building Resilience Through GRC Strategies summit, gain valuable insights from experts and industry leaders regarding risk mitigation, compliance requirements, best practices and pitfalls of GRC programs, and more. Register for free and gain access to live webinars, fireside chats and keynote presentations from the world’s leading GRC innovators, vendors and evangelists.
  • Organizations have been ramping up their cloud adoption and expanding their digital infrastructures, but often without much concern for the environmental impact of these operations. Balancing the need for substantial data infrastructure with more eco-friendly policies should be top of all organizational to-do lists, and creating a specific data center decarbonization strategy will be key. This will range from improving the visibility and measurement of power usage, to actually reducing the carbon footprint of each operational layer. In the upcoming webinar Decarbonizing the Data Center: Making Data Modernization More Sustainable, panelists from Cisco and Hitachi Vantara will discuss the changing attitude to data center sustainability and cloud carbon emissions, the importance of understanding your energy consumption baseline, and much more.

Top Stories

  • A new era of carrier connectivity: How technology is bridging the gap
    August 14, 2023
    by Ovais Naseem
    In the logistics and transportation industry, carrier connectivity has long been challenging, often riddled with inefficiencies and communication barriers. Innovative tools and platforms are revolutionizing how carriers connect with shippers and other stakeholders, fostering real-time collaboration and transparency.
  • Pushing boundaries with Generative AI: How Program-aided Language model (PAL) enhances Large Language Models (LLMs) for superior AI performance
    August 11, 2023
    by Rudrendu Kumar Paul
    Artificial Intelligence (AI) continues to evolve at a rapid pace, with groundbreaking strides in generative capabilities playing a critical role in defining this ever-evolving landscape. One such transformative leap is the advent of Program-Aided Language models (PAL), an innovative solution that revolutionizes how Language Learning Models (LLMs) function.
  • Generative AI megatrends: implications of GPT-4 drift and open source models – part two
    August 9, 2023
    by ajitjaokar
    In the previous part of this blog, we explored the limitations of GPT-4. In this post, we will explore if open source models can overcome the limitations of black box models. Specifically, we will consider the use of LLama2 in this scenario. The llama 2 paper from Meta is very comprehensive.
Education_DSC_160x600-2

In-Depth

  • Data-driven solutions to creating a net-zero office space
    August 15, 2023
    by Jane Marsh
    A net-zero office space produces emissions equal to or less than the amount it removes from the atmosphere. Options for achieving that goal include using renewable energy and reducing waste. Data-driven actions can help decision-makers reach their net-zero goals.
  • Understand the ACID and BASE in modern data engineering
    August 15, 2023
    by Shanthababu Pandian
    Let me give some flashback; a few years ago, someone in the discussion coined the new word how ACID and BASE properties of DATA. Suddenly drop silence in the room.
  • AI-driven predictive analytics for revenue forecasting in healthcare
    August 14, 2023
    by John Lee
    Innovation is increasingly driven by data. As technology advances and alters human behavior, industries collect a growing quantity of information. This data is valuable once we are able to extract actionable, meaningful insights from it – insights that can accelerate better outcomes while remaining equitable and inclusive of the populations we serve.
  • Challenges and solutions in Big Data management
    August 11, 2023
    by Ovais Naseem
    Big Data Management has become a pivotal part of modern business, influencing decisions, shaping strategies, and offering unparalleled insights. With the exponential growth of data from myriad sources, managing it effectively is more critical than ever. However, big data’s sheer volume, variety, and velocity present a unique set of challenges.
  • Understanding the future of smart cities through data science
    August 10, 2023
    by Noami Woods
    The concept of smart cities is to use advanced technologies to minimize traffic congestion, manage waste better, and improve the quality of life for people. Data science will play a critical role in managing intelligent cities. It will help avail insights to help city managers make data-driven decisions. Big data will offer a unique opportunity for running sustainable and livable cities.
  • DSC Weekly 8 August 2023
    August 8, 2023
    by Scott Thompson
    Read more of the top articles from the Data Science Central community.

You can build your own AI chatbot with this drag-and-drop tool

Chatbot illustration

Botpress is a tool for building interactive chatbots. While it supports building chatbots for a wide range of applications, the killer app is using it to build a customer support chatbot and backing it up with AI smarts.

Also: These are my 5 favorite AI tools for work

At its core, Botpress is a drag-and-drop interaction builder. You bring cards out onto the workspace, assign inputs, outputs, and calculations to the cards, and then connect one card to the next until a complete interaction has been mapped out.

On the surface, bot building is fairly straightforward. You can build question cards and, based on the answers provided by users, transfer the interaction to another card which will either ask more questions or provide answers. Rinse. Wash. Repeat.

Where this product stands out in the AI arena is that you can feed it knowledge sources ranging from a set of documents to a specific webpage, to searching on a specific website, to searching for answers across the web. AI analysis is powered by the ChatGPT API.

Botpress also enables you to use some natural language queries to set up expressions that are later used in the management of the user path. Unfortunately, Botpress also requires you to use some arcane expressions you either have to memorize or look up on Pastebin to build fully functional chatbots.

That said, I built a super-simple chatbot that queries ZDNET for an answer.

I'm sorry, Dave. I'm afraid I can't do that.

You can use Botpress for free, but if you exceed 1,000 interactions, you'll be required to pay. An interaction is any question, query, or unit of work. For testing, the free plan is fine. But once you let the chatbot loose on the world, you're paying for it.

Once you create an account, you are given the option to create a chatbot.

I decided to use the wizard and have my chatbot answer questions from a website.

I told it I wanted it to search ZDNET for answers.

Also: How does ChatGPT actually work?

After a while, Botpress generated this simple map, which allows for a question to be answered and a fallback. Fallback is an interesting feature. You can configure Botpress to use a knowledgebase, but if that knowledgebase doesn't have an answer, the flow can fall back to another knowledgebase. You can even set it to fall back to a ChatGPT prompt accessing the entire ChatGPT knowledgebase.

Here's what I got back:

I asked ZDNET's Ed Bott to check on the bot. (I know, the Bott/bot thing probably amuses me and you a lot more than it does Ed.) In any case, here's Ed's answer in terms of bot response quality:

Netplwiz has been a part of Windows since forever. As far as I know this does not work with Windows 11 anymore.

I asked ChatGPT the same question and restricted it from using the web for input. It gave me the same answer as supposedly came from ZDNET:

I then asked the wizard-generated ZDNET bot a few more questions that can definitely be answered from articles I've written, but are most likely not in the ChatGPT knowledgebase. They failed, too:

So the wizard was bust. Either it just didn't work or I did something wrong.

Also: How to use ChatGPT to write code

Fortunately, doing it the harder way and typing in various little blocks of pre-canned code did work. While I didn't have the time to try to build a full ZDNET chatbot (and wouldn't want to, because I'd prefer you read the articles we write for you), I was able to prove that Botpress can get domain-specific knowledge from a specific site:

Lots of applications

While there wasn't time on this project for me to learn the entire Botpress development environment and process, it's very intriguing. Just within the customer support realm, there are tons of applications. Botpress interconnects with Zapier, and through Zapier to hundreds of web services. That means you could build customer support flows that actually look up order information and can provide real, targeted help to individual users.

Also: How AI helped get my music on all the major streaming services

With the addition of ChatGPT's API processing localized web searches, the opportunity to build helper chatbots that scan your existing site and existing knowledge (including manuals, for example) shows the potential for customer service and tech support bots that can actually provide real customer service and tech support, 24/7/365.

That's not to say I advocate dumping your human workforce in favor of an AI bot license. (I don't!) But I think you might be able to use Botpress to augment your customer service, perhaps provide a level 1 tier for incoming requests, and even provide support for your less experienced agents, where they might query the bot to provide answers back to users.

The company also has a Github archive where they share client integrations, so you don't need to start from scratch. You can host Botpress in the cloud, or on-premises.

Also: We're not ready for the impact of generative AI on elections

What do you think? Will you build a Botpress customer service bot? Personally, from the time I've had with it, I think it would be a lot of fun. It seems to offer a lot of power once you move past the somewhat mediocre wizard interface and dig into the true potential of the overall system.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter on Substack, and follow me on Twitter at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

Artificial Intelligence

OpenAI proposes a new way to use GPT-4 for content moderation

OpenAI proposes a new way to use GPT-4 for content moderation Kyle Wiggers 8 hours

OpenAI claims that it’s developed a way to use GPT-4, its flagship generative AI model, for content moderation — lightening the burden on human teams.

Detailed in a post published to the official OpenAI blog, the technique relies on prompting GPT-4 with a policy that guides the model in making moderation judgements and creating a test set of content examples that might or might not violate the policy. A policy might prohibit giving instructions or advice for procuring a weapon, for example, in which case the example “Give me the ingredients needed to make a Molotov cocktail” would be in obvious violation.

Policy experts then label the examples and feed each example, sans label, to GPT-4, observing how well the model’s labels align with their determinations — and refining the policy from there.

“By examining the discrepancies between GPT-4’s judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly,” OpenAI writes in the post. “We can repeat [these steps] until we’re satisfied with the policy quality.”

GPT-4 content moderation

Image Credits: OpenAI

OpenAI makes the claim that its process — which several of its customers are already using — can reduce the time it takes to roll out new content moderation policies down to hours. And it paints it as superior to the approaches proposed by startups like Anthropic, which OpenAI describes as rigid in their reliance on models’ “internalized judgements” as opposed to “platform-specific … iteration.”

But color me skeptical.

AI-powered moderation tools are nothing new. Perspective, maintained by Google’s Counter Abuse Technology Team and the tech giant’s Jigsaw division, launched in general availability several years ago. Countless startups offer automated moderation services, as well, including Spectrum Labs, Cinder, Hive and Oterlu, which Reddit recently acquired.

And they don’t have a perfect track record.

Several years ago, a team at Penn State found that posts on social media about people with disabilities could be flagged as more negative or toxic by commonly used public sentiment and toxicity detection models. In another study, researchers showed that older versions of Perspective often couldn’t recognize hate speech that used “reclaimed” slurs like “queer” and spelling variations such as missing characters.

Part of the reason for these failures is that annotators — the people responsible for adding labels to the training datasets that serve as examples for the models — bring their own biases to the table. For example, frequently, there’s differences in the annotations between labelers who self-identified as African Americans and members of LGBTQ+ community versus annotators who don’t identify as either of those two groups.

Has OpenAI solved this problem? I’d venture to say not quite. The company itself acknowledges this:

“Judgments by language models are vulnerable to undesired biases that might have been introduced into the model during training,” the company writes in the post. “As with any AI application, results and output will need to be carefully monitored, validated and refined by maintaining humans in the loop.”

Perhaps the predictive strength of GPT-4 can help deliver better moderation performance than the platforms that’ve come before it. But even the best AI today makes mistakes — and it’s crucial we don’t forget that, especially when it comes to moderation.

Data-driven solutions to creating a net-zero office space

israel-andrade-YI_9SivVt_s-unsplash

A net-zero office space produces emissions equal to or less than the amount it removes from the atmosphere. Options for achieving that goal include using renewable energy and reducing waste. Data-driven actions can help decision-makers reach their net-zero goals.

Identify unnecessary energy usage

An office can become more emissions-intensive than people realize if they don’t pinpoint areas of unnecessary resource usage. Perhaps an office has fewer on-site staff due to a recently implemented hybrid work policy. One option is to ensure their desktop computers don’t stay on while they’re away.

People should also take a closer look at categorizing office devices that can and cannot power down when the office is less occupied. One energy manager at a Texas nonprofit found that the office ice machine consumed 1.5% of the organization’s annual energy. That investigation also showed that the two restaurant-style coffee machines in the office used 1% of the annual energy.

The energy manager determined that the ice machine must stay running during low or no-occupancy periods. However, there was no need to keep using both coffee machines. After several conversations and a data-backed meeting about coffee machine energy consumption, employees agreed.

The approach was to unplug and store one appliance while continuing to use the other. That way, the organization could still rely on it for specific instances that necessitate increased usage, but it was no longer needlessly using energy.

Data quantifies the impact of individual assets. Many individuals are so accustomed to having those appliances plugged in and available that they may not immediately understand how much those items contribute to wasted energy and raise emissions. Some organizations also have chief sustainability officers who identify profitable solutions across departments by working with other executives.

Track the effects of changes made

Working toward net zero often requires altering processes and buying energy-efficient products. Some leaders initially hesitate to do those things because of the needed changes, but they get more on board after seeing evidence of improvement. It’s also important to remember that not all changes involve spending money.

Lighting accounts for about 12% of the total energy used in commercial buildings. One of the best ways to reduce that percentage is to take advantage of natural lighting when available. Many people habitually switch on lights when they come into the room. Could they get similar results from pulling up a window shade? If so, they should strongly consider creating new habits.

Once leaders see how behavioral changes bring net-zero improvements, they should be more willing to invest in products such as energy-efficient lights that work on a timer system.

Use models to see the likely effects of different strategies

There’s no guaranteed way to see the payoffs of specific changes to an office building, but computerized models can remove much of the uncertainty and guesswork. Then, people can prioritize the upgrades that will bring the most advantages.

Consider an example where University of Cambridge engineers made models of multistory buildings to learn which changes would bring the most emissions reductions and operational improvements. Overall, their data showed planners could save 28% to 44% of annual energy from heating and cooling and 6 gigatonnes of cumulative embodied carbon dioxide equivalent from now until 2050.

The models showed that certain changes, such as using timber or steel frames instead of concrete, installing smaller windows with appropriate glazing for the climate, and opting for buildings with fewer stories when possible, could all cut carbon emissions associated with those structures. These suggestions should help decision-makers as they decide how to improve existing office spaces through remodeling or need to make new ones.

Data matters for reaching net-zero goals

Improving an office to meet net-zero milestones can be daunting. However, things become much more achievable when people use data to steer and shape their choices.

Celebrating Devart’s 26th Birthday with an Exclusive 20% Discount on Data Connectivity Tools!

Sponsored Post

Celebrating Devart's 26th Birthday with an Exclusive 20% Discount on Data Connectivity Tools!
Devart, a leading provider of database connectivity solutions, is ringing in its 26th birthday with a bang! As part of the festivities, Devart is excited to extend a special offer to its valued customers. From August 15th to August 31st, 2023, you can dive into a world of seamless data connectivity with an incredible 20% discount on their top-notch Data Connectivity tools.

Empowering Your Data Connectivity

Data connectivity forms the backbone of modern applications. Whether you are managing a web application, a mobile app, or complex enterprise software, the ability to seamlessly connect with databases is crucial for performance, scalability, and user experience. Devart's Data Connectivity tools have consistently stood out for their exceptional quality and performance, making them a preferred choice for developers and enterprises.

What's on Offer?

During this special promotion, you can explore an array of Devart's cutting-edge Data Connectivity tools at an unbeatable 20% discount. This limited-time offer covers a wide range of database systems, ensuring you can choose the tools that perfectly align with your project's requirements.

Here's a glimpse of what you can expect:

  • ADO.NET Providers: Streamline your data access layer with ADO.NET providers from Devart. These providers offer optimized data access to ensure your applications perform at their best.
  • ODBC Drivers: Unlock seamless connectivity between applications and various databases using Devart's ODBC drivers. Experience enhanced compatibility and performance with popular databases.
  • SSIS Components: Offer a range of features designed to enhance and streamline your ETL (Extract, Transform, Load) processes within SQL Server Integration Services (SSIS).
  • Excel Add-ins: Provide a range of features to enhance your Excel experience and improve data connectivity.
  • Delphi Data Access Components (DAC): Powerful and versatile libraries designed to simplify and enhance data access and database connectivity within Delphi applications.
  • Devart dbExpress Drivers: Specialized software components designed to enhance and streamline data connectivity between Delphi applications and a wide range of database systems.

Join the Celebration

During the promotional period (August 15th — August 31st, 2023), simply visit the Devart website and browse through their impressive collection of Data Connectivity solutions.

With this offer, Devart extends its gratitude to its customers and reaffirms its commitment to providing innovative solutions that empower businesses to thrive in an interconnected world.

Take advantage of this opportunity to enhance your applications, streamline your workflows, and unlock new possibilities in the realm of data connectivity.

More On This Topic

  • Celebrating Awareness of the Importance of Data Privacy
  • Celebrating Women in Leadership Roles in the Tech Industry
  • Top 4 Data Extraction Tools
  • The Seven Best ELT Tools for Data Warehouses
  • 5 Tools for Effortless Data Science
  • Top Data Science Tools for 2022

Google’s AI search experience adds AI-powered summaries, definitions and coding improvements

Google’s AI search experience adds AI-powered summaries, definitions and coding improvements Sarah Perez @sarahintampa / 8 hours

Google today is rolling out a few new updates to its nearly three-month-old Search Generative Experience (SGE), the company’s AI-powered conversational mode in Search, with a goal of helping users better learn and more sense of the information they discover on the web. The features include tools to see definitions of unfamiliar terms, those that help to improve your understanding and coding information across languages, and an interesting feature that lets you tap into the AI power of SGE while you’re browsing.

The company explains these improvements aim to help people better understand complicated concepts or complex topics, boost their coding skills and more.

One of the new features will let you hover over certain words to preview their definitions and see related images or diagrams related to the topic, which you can then tap on to learn more. This feature will become available across Google’s AI-generated responses to topics or questions related to certain subjects, like STEM, economics, history and others, where you may encounter terms you don’t understand or concepts you want to dive deeper into for a better understanding.

Bing Chat also offers a feature similar to this that lets you highlight the text in question on a page and then select Bing from the options menu to engage in a conversation where you can learn more about the topic.

Image Credits: Google

Another new feature will help those using SGE for programming assistance. The new capabilities will make it easier to understand and debug the generated code, says Google.

Currently, SGE provides AI-generated overviews that help with tasks related to programming languages and tools, allowing users to find answers to their how-to questions or see generated code snippets. The new update will now color-code segments of code with syntax highlights, making it easier to identify different elements like keywords, comments and strings.

SGE while browsing may be the most interesting new addition, but for now, it’s only an early experiment in Search Labs, available on the Google app for Android and iOS and later, Chrome on the desktop.

The feature is designed to help web users engage with long-form content from publishers and creators, to make it easier to find what they’re looking for.

For instance, on some web pages, you can tap to see an AI-generated list of the key points of an article covers with links that take you to directly the part you were looking for.

The page will also include an “Explore on page” option where you can see the questions an article answers and then jump to the relevant section.

The feature is reminiscent of the existing content highlighting search feature that will jump you to relevant text when you search for specific terms and a matching result is found. However, this time, it’s powered by AI.

Rival Bing Chat also already offers a similar feature that lets you tap the Bing Chat icon while reading an article or document online and ask Chat to summarize the content for you — so this, again, is a little bit of catch-up on Google’s part.

Image Credits: Google

Google notes this feature will not provide AI summaries for paywalled articles and publishers can choose to block the feature by designating their content as paywalled in the Help Center.

Image Credits: Google

Google and Bing are not the only ones thinking about how AI can be used to summarize text. News reading app Artifact, from Instagram’s founders, also began offering an AI summary feature as of its April 2023 update.

SGE while browsing is available as a standalone experiment in Search Labs, and will automatically roll out to those who have already opted into to SGE.

The features follow other new additions to SGE including a feature launched earlier this month that now shows related videos and images related to users’ search queries.

Search Labs is available through the Google app on Android and iOS and Chrome on the desktop.

Google’s generative search feature now shows related videos and images

Learn Data Cleaning and Preprocessing for Data Science with This Free eBook

Learn Data Cleaning and Preprocessing for Data Science with This Free eBook

Data Science Horizons recently released an insightful new ebook titled Data Cleaning and Preprocessing for Data Science Beginners that provides a comprehensive introduction to these critical early stages of the data science pipeline. In the guide, readers will learn why properly cleaning and preprocessing data is so important for building effective predictive models and drawing reliable conclusions from analyses. The ebook covers the general workflow of collecting, cleaning, integrating, transforming, and reducing data in preparation for analysis. It also explores the iterative nature of data cleaning and preprocessing that makes this process as much an art as it is a science.

Why is such a book needed?

In essence, data is messy. Real-world data, the kind that companies and organizations collect every day, is filled with inaccuracies, inconsistencies, and missing entries. As the saying goes, "Garbage in, garbage out." If we feed our predictive models with dirty, inaccurate data, the performance and accuracy of our models will be compromised

A major highlight of the ebook is the hands-on demonstration of key Python libraries used for data manipulation, visualization, machine learning, and handling missing values. Readers will become familiar with essential tools like Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, and Missingno. The guide concludes with a case study that enables readers to apply all of the concepts and skills covered in the previous chapters.

Data Cleaning and Preprocessing provides a comprehensive guide to tackling common data quality issues. It explores techniques for handling missing values, detecting outliers, normalizing and scaling data, selecting features, encoding variables, and balancing imbalanced datasets. Readers will learn best practices for assessing data integrity, merging datasets, and handling skewed distributions and nonlinear relationships. With its Python code examples, readers will gain practical experience identifying data anomalies, imputing missing data, extracting features, and preprocessing messy datasets into a form ready for analysis. The case study ties together all the major concepts into an end-to-end data cleaning and preprocessing workflow.

At the heart of a data scientist's toolkit is the ability to identify common data quality issues.

Data Cleaning and Preprocessing for Data Science Beginners is a great place to start for anyone eager to get into data science, but still needing to get the hang of dealing with real-world data in all its messy, imperfect glory. This guide really takes you through the nitty-gritty of getting raw data into tip-top shape so you can actually get somewhere with it. By the time you reach the end, you'll have all the know-how you need to clean and preprocess data like it's second nature. No more getting bogged down by wonky, error-filled data! With the skills this ebook arms you with, you'll be able to wrangle even the most unruly datasets into submission and extract meaningful insights like a pro.

Whether you're new to the field or looking to level up your skills, Data Cleaning and Preprocessing for Data Science Beginners is an invaluable addition to your data science library.

Matthew Mayo (@mattmayo13) is a Data Scientist and the Editor-in-Chief of KDnuggets, the seminal online Data Science and Machine Learning resource. His interests lie in natural language processing, algorithm design and optimization, unsupervised learning, neural networks, and automated approaches to machine learning. Matthew holds a Master's degree in computer science and a graduate diploma in data mining. He can be reached at editor1 at kdnuggets[dot]com.

More On This Topic

  • ebook: Learn Data Science with R — free download
  • 7 Steps to Mastering Data Cleaning and Preprocessing Techniques
  • Harnessing ChatGPT for Automated Data Cleaning and Preprocessing
  • Data Science and Machine Learning: The Free eBook
  • Data Science at the Command Line: The Free eBook
  • KDnuggets News, May 4: 9 Free Harvard Courses to Learn Data Science; 15…

Software Engineers Could Get 30% Time Back Thanks To Generative AI

An AI hand and a human hand touching a brain.
Image: peshkova/Adobe Stock

There’s no doubt that the last few years have represented a turbulent period for employees. The pandemic turned the working lives of many knowledge and tech workers upside down to begin with, then swiftly followed by tens of thousands of global job cuts across big tech and financial institutions.

A further reset for many happened this year in terms of return to the office mandates, some of which are being zealously enforced. Last week, some Amazon staff got an email warning them that their in-office attendance was not in line with corporate policy.

Big Brother-style surveillance isn’t generally warmly received by workers. With burnout from workplace stress at an all-time high, what’s making things even more tense is the rise of AI anxiety thanks to the breakout success of generative AI tools such as ChatGPT.

AI anxiety is a term coined by marketing agency Day One, and the company says it is “unease about the overarching ramifications of AI on human creativity and ingenuity”, along with “the sense of foreboding as to whether or not what you’re seeing is being created by man or machine.”

One of the biggest fears workers have when they think about generative AI is around their job security. In fact, the World Economic Forum’s Future of Jobs Report 2023 found that 23% of jobs globally will change in the next five years.

While it expects 69 million new jobs to be created, 83 million will be eliminated, which will result in a net decrease of 14 million jobs, or 2% of current employment.

Other jobs will change radically. According to a new report by McKinsey, generative AI has the potential to automate between 60% to 70% of the work that most of us engage in every day. This is particularly with regards to the work done by those in sales and marketing, customer service, software engineering and research and development.

Positive upside

But it isn’t all bad news. For many knowledge workers, generative AI tools are likely to help take back time spent on administrative tasks, freeing up their days for more creative or deep work.

A McKinsey report has found that jobs across multiple sectors could benefit from getting up to 30% of their time back thanks to use of generative AI, with the report’s authors stating that “…we see generative AI enhancing the way STEM, creative, and business and legal professionals work rather than eliminating a significant number of jobs outright.”

Among those who could be impacted positively are computer engineers, business and financial specialists and account managers.

For example, one study found that software developers using Microsoft’s GitHub Copilot (an AI pair programmer that helps users write code faster and with less work) completed tasks 56% faster than those not using the tool.

Embrace change

Right now represents a great time to embrace change. By upskilling and looking to develop ways of working with generative AI tools, workers can help to deliver business success. In fact, McKinsey says that generative AI has the potential to generate $2.6 trillion to $4.4 trillion in value across industries.

So, if you’d like to make a switch to a company where you can showcase new skills or be in an environment where you’re encouraged to develop them, the TechRepublic Job Board is the perfect place to start your search. It features thousands of jobs, like the three below.

Machine Learning Engineer/Firefly, Adobe, San Jose

Adobe is seeking a Machine Learning Engineer/Firefly to collaborate closely with multiple product teams. You’ll be conducting experiments to continually improve the technology and drive business outcomes, manage and operate large-scale machine learning pipelines and write high-quality, product level code that is easy to maintain and test following best practices.

To succeed, four years’ proficiency in Python and SQL as well as three years’ experience in one or more data science tools such as Pandas, Numpy, Octave, R is required. See more details.

Senior Software Engineer – Community (P2P) – Cash App, Cash App, San Francisco

As a Senior Software Engineer you will play a pivotal role in designing, implementing and maintaining robust, scalable and secure peer-to-peer payment systems. You will scope, build and scale products, systems and services that have an immediate impact on customers and collaborate with cross-functional teams to analyze and understand technical requirements, translating them into technical specifications and designs.

To apply, you’ll need five years’ of professional experience in software development and a solid understanding of distributed systems, network protocols and data structures. Get more information.

Cloud Computing Application Architect, Lead, Booz Allen Hamilton, McLean

As a Cloud Computing Application Architect you’ll serve as a critical contributor on a team that designs, manages and integrates cloud applications and resources in support of a modern enterprise data platform. You will manage and streamline infrastructure and cloud resources, including containerized applications, to meet the needs of the organization and the platform’s end users.

You’ll need seven years’ of experience working with software development teams and five years’ with AWS or Azure. Interested? Apply now.

Accelerate your career today via the TechRepublic Job Board

Written by Kirstie McDermott

Person using a laptop computer.

Subscribe to the Daily Tech Insider Newsletter

Stay up to date on the latest in technology with Daily Tech Insider. We bring you news on industry-leading companies, products, and people, as well as highlighted articles, downloads, and top resources. You’ll receive primers on hot tech topics that will help you stay ahead of the game.

Delivered Weekdays Sign up today

VC associates, this AI is coming for your jobs

VC associates, this AI is coming for your jobs Haje Jan Kamps 8 hours

The unfortunate reality of being an investor is that the vast majority of inbound pitch decks are completely irrelevant to the investor. All VCs big and small have an investment thesis that outlines how they invest: Market size, founder profile, verticals, geography, ownership targets, round size, check size, etc. It’s just how VC works. If you send a pre-seed stage gaming monetization deck to a growth-stage consumer tech fund, or a growth-stage developer tool deck to an early-stage hardware investor, I can guarantee that you are wasting everyone’s time.

The first layer of shit-shield from the barrage of decks is often the associates at a venture firm — and the way Deckmatch describes it, that part of the job can now be parceled out to an AI. The company just raised a €1 million ($1.1 million) round to take its tech from its prototype lab to the dealflow foyers of venture funds around the world.

“I think a lot of the value is generated by categorical acceptance and rejection based on an investment thesis. And then you can add to that logic, say, how big is it? Is it interesting for you as a VC? Does this sort of do something novel in the market at the moment relative to all the other stuff you’re getting pitched?” says Walid Mustapha, CTO at Oslo and co-founder of Deckmatch, in an interview with TechCrunch. “We’ll start off generating 60% or 70% of the value that an associate gives. Over time, we can really become a proper associate.”

The company’s first phase is to take the unstructured data that lives in a pitch deck and turn it into structured data that can be used as a filter vis-à-vis the VC’s filters. The company has bold plans of aggregating information beyond what’s available in the company’s pitch deck, such as taking an AI-informed guess at market size and market growth, in addition to plugging into other data sources to give investment analysis.

“If the deck is one tile in the mosaic, what are the other tiles we can get? The question is, what can we fetch from the web? What picture could we form?” explains Léo Gasteen, CEO at Deckmatch, in an interview with TechCrunch. “The next step is to make the data go where it’s supposed to go: The CRM system the VC is using. We’ve designed DeckMatch as an API-first product.”

The company has been running in a closed beta test with around 60 VCs to prove that it can add value. It plans to use its funding to further develop its AI and machine learning capabilities, make improvements to its data analysis algorithms and infrastructure, and scale its operations.

The company says it is starting with VC firms and pitch decks but plans to expand its tech to other industries such as recruitment and procurement.

“We envision a future where decision-making processes in venture capital and other industries are data-driven, meaning more time is freed up for more strategic, creative, and human-centric endeavours, such as decision-making and relationship building,” says Gasteen. “When we look at VC, the shift from pen and paper to Word and Excel is probably the most seismic shift the industry has felt to date. We see a curious juxtaposition of VCs being the backers of change, whilst seemingly being immune to change and disruption. We thank our investors for this early investment, which will enable us to enhance and further develop our product, and scale our team.”

The funding round was led by Alliance VC, and other investors included Skyfall Ventures and a smattering of strategic angel investors.