5 AI Models of 2023 that will Transform the Medical Landscape 

Large language models have enabled us to use AI to solve many a real-world problems. However, using AI in the medical field is a different ball-game altogether as it requires us to prioritize safety, equity, and fairness. In 2023, various foundational models in the medical domain remained at the forefront of advancements.

Here are top 5 models that promise significant impact on healthcare practices.

Med-PaLM 2

Med-PaLM is a language model that uses artificial intelligence to answer medical questions with high accuracy. It has been specifically designed and tested for the medical domain, using medical exams, research, and consumer queries. The latest version of Med-PaLM, called Med-PaLM 2, was unveiled at Google Health’s annual event in March 2023.

This model has an impressive accuracy rate of 86.5% on USMLE-style questions and can provide comprehensive and accurate answers to consumer health questions. Limited testing of Med-PaLM 2 will be conducted soon to explore potential use cases and gather feedback.

AlphaFold 2.3

AlphaFold, a cutting-edge AI system developed by DeepMind, has the ability to predict protein structures computationally with unparalleled accuracy and speed. In collaboration with EMBL’s European Bioinformatics Institute (EMBL-EBI), they have made available more than 200 million AlphaFold-generated protein structure predictions that are openly accessible to the scientific community worldwide.

There are predictions including almost all known cataloged proteins – offering the potential to significantly expand the knowledge of biology. AlphaFold is an AI-based protein-folding solution recognized by the Critical Assessment of Protein Structure Prediction (CASP) community. CASP challenges teams to predict protein structures using amino acid sequences for proteins with known 3D shapes.

Bioformer

Pretrained language models like Bidirectional Encoder Representations from Transformers (BERT) have shown impressive results in natural language processing (NLP) tasks. Recently, BERT has been adapted for the biomedical domain. However, these models have a high number of parameters, making them computationally expensive for large-scale NLP applications.

The creators of BERT hypothesized that reducing the number of parameters would not significantly affect its performance therefore, they developed Bioformer, which is a compact BERT model specifically designed for biomedical text mining. Bioformer uses a biomedical vocabulary and was pre-trained from scratch on PubMed abstracts and PubMed Central full-text articles.

The creators trained two Bioformer models – Bioformer8L and Bioformer16L – which reduced the model size by 60% compared to BERTBase.

RoseTTAFold All-Atom

RoseTTAFold is an accurate deep-learning program that models protein structures. It was designed for biomolecules made entirely of amino acids. In 2023, the new upgrade called RoseTTAFold All-Atom was introduced. With this upgrade, the program can model full biological assemblies that contain different types of molecules, including proteins, DNA, RNA, small molecules, metals, and other bonded atoms, including covalent modifications of proteins.

This upgrade is significant because proteins usually interact with other non-protein compounds to function correctly. With RoseTTAFold All-Atom, scientists can model how proteins and small-molecule drugs interact. This capability may be beneficial for drug discovery research.

ChatGLM-6B

It’s believed that training and deploying a dialogue model for hospitals is not feasible, which has hindered the use of LLMs in the medical industry. To address these issues, the developers collected databases of medical dialogues in Chinese with the help of ChatGPT and have used several techniques to train an easy-to-deploy LLM. Notably, the developers were able to fine-tune the ChatGLM-6B on a single A100 80G in just 13 hours, making it very affordable to have a healthcare-purpose LLM.

ChatGLM-6B generates an- swers that are aligned with human preference. Furthermore, we use low-rank adaptation (LoRA) to finetune ChatGLM with only 7 million trainable parameters. The fine-tuning process using all Chinese medical dialoguedataset was conducted using an A100 GPU for a du- ration of 8 hours.

The post 5 AI Models of 2023 that will Transform the Medical Landscape appeared first on Analytics India Magazine.

Meet the Roboticist Who Is Solving Household Issues with Robots

Without any disagreement, natural language is the ‘most paid attention to’ segment of AI today. Every month, there is a breakthrough that everyone goes gaga about. If you wonder what is the robotics equivalent of all of this? “There are not many answers”, believes Lerrel Pinto, an Assistant Professor of Computer Science at NYU Courant working in robotics and machine learning.

“The progress in robotics is definitely a lot slower. It’s one of the harder and more impactful problems for several reasons. That’s why we work on robotics,” he shared in an interview with AIM.

In September, Pinto made it to MIT Technology Review’s 2023 Innovators Under 35 list. As per Pieter Abbeel, the director of the robot learning lab at the University of California, Pinto’s current research will be looked back upon as having laid many of the early building blocks of the future of robot learning. And there’s good enough reason to believe so.

According to the 31-year-old researcher, “There’s this promise of robots where eventually we’ll have robots in our homes doing things that we as humans don’t want to do.” Following this frame of mind, a month ago, alongside his team, Pinto introduced Dobb-e, an open-source, general framework for robots to learn household manipulation.

“When you look at most research in robotics, they’re all done in these very constrained lab environments,” Pinto rightly pointed out. He added that every robotics lab has a white table top and a neat background behind it. You ask the robot just to pick up one object and place it somewhere. You’ll find that the most robotic research papers follow this.

(Source: Google’s PaLM-E demo)

The roboticist ain’t wrong. If you look at the big tech research labs for robotics, they look exactly how Pinto described it. Earlier this year, Google launched PaLM-E, which can integrate vision and language for robotic control. Here’s a picture of the research lab from the company’s demo video, similar to how Pinto described it.

“From my viewpoint, you will never solve hard problems until you actually try to solve them. Suppose you’re always working in a lab environment. In that case, you’re never going to make progress on the actual hard problems,” Pinto believes. His team did the Dobb-e project, “hopefully trying to convince the research community to stop working on toy problems and focus on real problems inside people’s homes,” he said.

Generative AI meets Robotics

Pinto dreams of seeing robots in our environment where they are very close to humans, running chores like cleaning dishes, doing laundry or rearranging objects on a table.

“Now, in the context of these types of problems, generative AI is most useful when you have data,” he said. “If I show the robot some data on how to fold a t-shirt and give it a new t-shirt to fold, that is a generative process. It’s a process of generating robotic behaviour. Tools in generative AI, like diffusion models and transformers, have been directly used on these problems, and they work well. Our lab has worked on some transformer models, and this field is moving fast,” Pinto described.

But the pain point of generative AI is everyone is trying to do something with it. Pinto explained that there’s a bidirectional effect because certain things get popular, and people start doing more topical things. Many people doing niche research start focusing on popular topics, which is a choice people have to make, but there’s a lot of exciting work in the community.

Pinto highlighted a project he is involved in, aiming to speed up MRIs, which currently take 30-40 minutes due to scanning all frequencies. So there’s this question: can I get an MRI scan without scanning every frequency? “If we can scan at one frequency and get the information, you do not need to waste 30 more minutes like scanning all the other frequencies. That will just make the technology much cheaper and make it possible for more people to access this machine. These types of problems where one can make the system more efficient with AI and impact people’s lives, “he noted.

Choose Data Wisely

The problem of bias in data has been extensively covered since the GPTs gained popularity. With robotics, you have a fascinating problem, where most of the data is generated by roboticists because there is no internet of robot data, the researcher explained. “Most of the robot data is created by researchers like me at Google, Berkeley, and other places to show the robot how to do things. There you have a problem because the choice of what types of tasks you show the robots have a huge impact on where these robots will work,” Pinto mentioned.

For example, Pinto said if he showed the robot how to open a door, where the doors are only in affluent houses. Now, if he takes this robot and goes to maybe a less affluent household, the way it looks will be different. The handles will look different, and the robot won’t know how to operate in these houses.

This is the case with household robotics, but he elaborated that even for self-driving, you have the same problem. “Let’s say I train a self-driving car on the streets of San Francisco, and now I want to drive in Detroit or someplace where the streets look very different. It’s not going to work because there is a big mismatch. It’s just going to crash. So you have these types of issues where there are concerns over where you collected this data and what you are trying to deploy,” he said.

The post Meet the Roboticist Who Is Solving Household Issues with Robots appeared first on Analytics India Magazine.

Now AI Can Help Fight Human Trafficking

In 2017, Kubiiki Pride’s desperate search of 270 days to find her 13-year-old daughter led her to the world of the dark web. She rummaged through hundreds of advertisements on backpage.com that hosted around 70% of online sex ads in the US market until she found her daughter featured as an escort available to be rented out, where she was disturbingly described as ‘young and new‘.

The link with stars and hearts struck her, which she then clicked to find explicit photos of her daughter. The mother had to buy the service to get her daughter back, who was by then addicted to drugs and had been brutally abused.

Human trafficking encompasses recruitment, transportation, entrapment, brokering, delivery, and exploitation of victims. Law enforcement and government agencies have been using artificial intelligence to search through labyrinths of data points to fight this heinous crime.

One of the major tasks for authorities is identifying clues of human trafficking in online sex adverts as thousands of them are posted every week. However, the distinction between human trafficking risk and consensual sex work can be identified through the use of AI and machine learning algorithms by studying the deceptive recruitment behaviours linked to sex sales visible in deep web data.

Law enforcement has increasingly monitored websites like skipthegames.com, backpage.com and others, with 14 million records analysed for suspicious activity.

These records contain post text, location details, phone numbers, and metadata, revealing recruitment strategies leading to sex sales. For instance, recruitment ads for escort or modelling services, devoid of explicit sex service mentions, are linked to the same contact information in multiple sex sales ads, signalling potential deception and trafficking risks.

“We’ve learned that each organisation can have multiple templates they use when they post their ads, and each template is more or less unique to the organisation. By template matching, we essentially have an organisation-discovery algorithm,” says Dr. Lin Li from the Artificial Intelligence (AI) Technology Group at MIT Lincoln Laboratory.

Collaborations between academia and law enforcement agencies are advancing this aspect. The Lincoln Laboratory, led by DR. Lin has developed algorithms to extract specific signatures from images associated with trafficking networks. It involves leveraging machine learning to identify potential trafficking activities within online sex ads, even in decentralised platforms. They’ve also created systems to analyse digital evidence, from text to imagery and audio, making it easier for investigators to connect the dots and build cases efficiently.

Additionally, Researchers at Carnegie Mellon University have developed an AI-based tool called Traffic Jam, which compares images uploaded by hotel guests to a database of known trafficking locations. It uses facial recognition and geospatial software to identify missing individuals in online ads.

DARPA’s Memex program and IBM’s Traffik Analysis Hub also analyse vast datasets from social media, online ads, and the dark web to detect high-risk locations and patterns.

Identifying Supply Chains

Hamsa Bastani, a Professor at the Wharton School, recently presented a summary of her ongoing work using machine learning and Snorkel AI’s tools, which helps detect and track activities that are associated with a high risk for global sex trafficking—including the analysis of recruitment-to-sex-sales pathways, offering insights into the complex networks facilitating human trafficking.

High-density edges in the visualisation signify significant movement between locations, showcasing the intricate flow of trafficking activities. Expected patterns, such as victims recruited in the Midwest being sold on different coasts in the US and the trafficking of Eastern European women, align with prior anticipations and with mentions of supply chains in India

The goal is to label new posts and discover unique recruitment patterns to understand trafficking networks comprehensively.

This three-part methodology involves domain-informed vocabulary creation, word embedding, and weak learning to form an ensemble model using Snorkel. Expert labelling forms a balanced dataset, enabling the active learning pipeline to identify new recruitment patterns beyond previous knowledge. Finally, metadata linking recruitment to sales posts helps unravel the trafficking supply chain, which is vital for policy implications and law enforcement strategies.

This analysis is also pivotal for law enforcement agencies, offering a strategic approach to combating trafficking. It recommends coordination between jurisdictions involved in recruitment (point A) and sales (point B) to disrupt the supply chain effectively. By targeting the problem from both ends, law enforcement can enhance the success rate of arrests without overburdening resources. This method of targeted collaboration aligns with the observed trafficking flow and helps direct law enforcement efforts more efficiently.

AI-powered platforms like the Global Emancipation Network’s Minerva are already facilitating cross-border collaboration among anti-trafficking organisations, streamlining communication and data sharing.

Drying the Money Flow

Identifying money trails through Bitcoin pathways or banks is another important aspect in this fight, which helps thwart the financial backing of such endeavours, resulting in their collapsing completely.

After being trafficked from Hungary to Canada at the age of 21, Tamir, one such survivor turned advocate, recognised the importance of these transactions in exposing the workings of human traffickers’ operations. She implored financial institutions to join the fight, which prompted Peter Warrick, the director of anti-money laundering risk intelligence at the Bank of Montreal, to rally Canada’s top five banks and the financial regulator to launch “Project Protect.”

The evolving landscape of financial technology, or FinTech, has introduced a new frontier in combating crime. Companies like QuantaVers are at the forefront of using machine learning technology to detect potential criminal activities within financial transactions.

Organisations like Liberty Asia also bridge the gap by providing critical on-the-ground intelligence to banks, enabling them to track and identify traffickers globally because such operations extend beyond borders.

This crackdown on illicit funds laundered through the financial system has resulted in fines exceeding $321 billion over nine years for the banks in an effort to push them towards bolstering their anti-money laundering efforts. However, the success in uncovering trafficking networks represents only a fraction of the success in undermining this elicit industry, which is worth a whopping $150 billion every year.

The post Now AI Can Help Fight Human Trafficking appeared first on Analytics India Magazine.

Microsoft Releases Phi-2, Outperforms Gemini Nano, Mistral 7B, and Llama 2 Models

Microsoft has released its Small Language Model (SML) Phi-2, a 2.7 billion-parameter language model showcasing exceptional reasoning and language understanding abilities.

Phi-2, a Transformer-based model with a next-word prediction objective, underwent training on 1.4T tokens from a mix of Synthetic and Web datasets for NLP and coding. The training process, conducted over 14 days using 96 A100 GPUs, resulted in Phi-2, a base model without alignment through reinforcement learning from human feedback (RLHF) or fine-tuning instructions.

Despite its modest 2.7 billion parameters, Phi-2 outperforms Mistral and Llama-2 models, both at 7B and 13B parameters, across various aggregated benchmarks. Particularly noteworthy is its superior performance compared to the significantly larger 70B-parameter Llama-2 model in multi-step reasoning tasks, such as coding and math.

Furthermore, Phi-2 matches or outperforms the recently-announced Google Gemini Nano 2, despite being smaller in size.

Microsoft couldn’t help but make a subtle reference to Google’s staged demo video for Gemini, which received significant criticism. In the video, Google showcased its upcoming AI model, Gemini Ultra, solving complex physics problems and rectifying students’ errors.

Interestingly, Microsoft highlighted that despite Phi-2 likely being a fraction of the size of Gemini Ultra, it demonstrated the ability to provide accurate answers and correct students using similar prompts.

The post Microsoft Releases Phi-2, Outperforms Gemini Nano, Mistral 7B, and Llama 2 Models appeared first on Analytics India Magazine.

Indian Startup Sarvam AI Launches Hindi LLM, OpenHathi 

Indian AI startup Sarvam AI has released OpenHathi-Hi-v0.1, the first Hindi LLM in the OpenHathi series. Developed on a budget-friendly platform, the model, an extension of Llama2-7B, boasts GPT-3.5-like performance for Indic languages.

OpenHathi, featuring a 48K-token extension of Llama2-7B’s tokenizer, undergoes a two-phase training process. The initial phase focuses on embedding alignment, aligning randomly initialized Hindi embeddings, followed by bilingual language modeling, teaching the model cross-lingual attention across tokens.

The model demonstrates robust performance across various Hindi tasks, comparable to, if not surpassing, GPT-3.5, while maintaining English proficiency. Sarvam AI’s evaluation includes non-academic, real-world tasks alongside standard Natural Language Generation (NLG) tasks. Evaluations against GPT-3.5 generation with GPT-4 as the judge revealed superior performance in Hindi, both in native and Romanised scripts.

Developed in collaboration with academic partners at AI4Bharat, who contributed language resources and benchmarks, and fine-tuned in partnership with KissanAI, the model leverages conversational data from a bot interacting with farmers in multiple languages.

KissanAI recently announced the launch of Dhenu 1.0, a groundbreaking Agriculture Large Language Model. Tailored specifically for Indian agricultural practices, this bilingual model comprehends English, Hindi, and Hinglish queries, a notable feature catering directly to farmers’ linguistic needs.

Pratyush Kumar and Vivek Raghavan, co-founders of Sarvam AI, launched the startup in July 2023, backed by $41 million in Series A funding led by Lightspeed, with participation from Peak XV Partners and Khosla Ventures.

Stemming from the founders’ background in AI research and digital infrastructure development, the startup aims to cater to India’s unique needs, prioritising Generative AI integration for diverse Indian languages and fostering collaborations with enterprises for domain-specific AI model development using their data.

You can find the base model here.

The post Indian Startup Sarvam AI Launches Hindi LLM, OpenHathi appeared first on Analytics India Magazine.

Tesla Optimus Just Got Brand New Hands, Speed Boost

Tesla Optimus Just Got Brand New Hands, Speed Boost

Tesla’s Optimus just got a major overhaul. With the December update, Optimus-Gen 2 has new actuators and sensors. It now has a 2 degree of freedom (DoF) actuated neck, allowing more movement.

With new electric motors, there is a 30% walk speed boost, along with torque sensing, articulated toe sections, and human foot geometry. These with a 10 kg weight reduction, gives Optimus improved balance and easier full body control.

There’s a new bot in town 🤖
Check this out (until the very end)!https://t.co/duFdhwNe3K pic.twitter.com/8pbhwW0WNc

— Tesla Optimus (@Tesla_Optimus) December 13, 2023

The best part is the improvement in hands with faster motion and 11 DoF, giving it brand new hands. With a tactical sension on all fingers, the humanoid can handle delicate objects easily. In the video, Optimus holds an egg between the thumb and forefinger of left hand and transfers it to the other hand, then places it on the table gently.

The video ends with two Optimus dancing with disco lights.

Almost a year ago, Elon Musk unveiled the prototype of Tesla’s Optimus. It was still in the early stages of development and not much was revealed by the company regarding how exactly it functioned. Now, Tesla has announced major improvements in its humanoid robots and it looks like it is moving closer to what Musk has envisioned for Optimus.

Last year, Optimus just waved on the stage. In September, it could pick up and sort objects, do yoga, and navigate through surroundings. Moreover, compared to others such as Boston Dynamics that work on rule-based systems, Optimus works on neural networks.

The post Tesla Optimus Just Got Brand New Hands, Speed Boost appeared first on Analytics India Magazine.

Meta just gave its $299 smart glasses their biggest AI upgrade yet, and I’m beyond excited

A person holding up the Meta Ray Ban smart glasses

When Meta first launched its Ray-Ban smart glasses, there was one feature that I was excited to try but couldn't. The promise of a multimodal AI device capable of answering questions based on what the user was staring at sounded like a dream wearable, but Meta wouldn't be rolling out that functionality to its $299 smart glasses until "next year." That idolized future may be closer than I anticipated.

Also: Meta's $299 Ray-Ban smart glasses may be the most useful gadget I've tested all year

Today, the company is launching an early access program that will allow Ray-Ban Meta smart glasses users to test the new multimodal AI features, all of which leverage the onboard camera and microphones to process environmental data and provide contextual information such as what a user is staring at.

How it all works is rather straightforward. You start a Meta AI prompt by saying, "Hey Meta, take a look at this," followed by the specifics. For example, "Hey Meta, take a look at this plate of food and tell me what ingredients were used." To answer the question, the glasses capture an image of what's in front of you and then break down the various subjects and elements with generative AI.

The functionality goes beyond the usual "What is this building?" or "What's the weather like today?" prompts, of course, as Meta CEO, Mark Zuckerberg, demoed in an Instagram reel. In the video, Zuckerberg asks Meta AI, "Look and tell me what pants to wear with this shirt." as he holds up a rainbow-striped button-down. Not only does the voice assistant identify the apparel, but it suggests pairing it with dark-washed jeans or solid-colored trousers. (The real question is do tech CEOs actually wear outfits beyond the monochromatic t-shirts and dark-colored pants.)

(Side note: Up until today, Meta AI on the Ray-Ban glasses had a knowledge cutoff of December 2022. According to Meta CTO Andrew Bosworth, they now have access to real-time info thanks to Bing.)

Also: Meta rolls out its AI-powered image generator as a dedicated website

Only a small batch of users will receive the new update at first, as Meta plans to collect feedback and refine its upcoming AI features before the official releases. To participate, update the Meta View app to the latest version, tap on the gear icon in the bottom right of the menu bar, swipe down to "Early Access," and tap "Join Early Access."

I'm not seeing anything resembling an early access program on my Android and iOS apps, but you can bet that when the update comes along, I'll be quick to download and start testing it — because what was already one of the most useful tech gadgets I tested in 2023 is about to become even more useful.

EU’s AI Act: Europe’s New Rules for Artificial Intelligence

The European Union reached a provisional agreement on its much-anticipated Artificial Intelligence Act on Dec. 8, becoming the first global power to pass rules governing the use of AI.

The legislation outlines EU-wide measures designed to ensure that AI is used safely and ethically, and includes limitations on the use of live facial recognition and new transparency requirements for developers of foundation AI models like ChatGPT.

Jump to:

  • What is the AI Act?
  • What are the penalties for breaching the AI Act?
  • How significant is the AI Act?
  • What have been some challenges associated with the AI Act?
  • What are critics saying about the AI Act?
  • What’s next for the AI Act?

What is the AI Act?

The AI Act is a set of EU-wide legislation that seeks to place safeguards on the use of artificial intelligence in Europe, while simultaneously ensuring that European businesses can benefit from the rapidly evolving technology.

The legislation establishes a risk-based approach to regulation that categorizes artificial intelligence systems based on their perceived level of risk to and impact on citizens.

The following use cases are banned under the AI Act:

  • Biometric categorisation systems that use sensitive characteristics (e.g., political, religious, philosophical beliefs, sexual orientation, race).
  • Untargeted scraping of facial images from the internet or CCTV footage to create facial recognition databases.
  • Emotion recognition in the workplace and educational institutions.
  • Social scoring based on social behaviour or personal characteristics.
  • AI systems that manipulate human behaviour to circumvent their free will.
  • AI used to exploit the vulnerabilities of people due to their age, disability, social or economic situation.

However, there are caveats to the provisional agreement as it currently stands. Perhaps most significant is the fact that the AI Act won’t come into force until 2025, leaving a regulatory vacuum in which companies will be able to develop and deploy AI unfettered and without any risk of penalties. Until then, companies will be expected to abide by the legislation voluntarily, essentially leaving them free to self-govern.

What do AI developers need to know?

Developers of AI systems deemed to be high risk will have to meet certain obligations set by European lawmakers, including mandatory assessment of how their AI systems might impact the fundamental rights of citizens. This applies to the insurance and banking sectors, as well as any AI systems with “significant potential harm to health, safety, fundamental rights, environment, democracy and the rule of law.”

AI models that are considered high-impact and pose a systemic risk – meaning they could cause widespread problems if things go wrong – must follow more stringent rules. Developers of these systems will be required to perform evaluations of their models, as well as “assess and mitigate systemic risks, conduct adversarial testing, report to the (European) Commission on serious incidents, ensure cybersecurity and report on their energy efficiency.” Additionally, European citizens will have a right to launch complaints and receive explanations about decisions made by high-risk AI systems that impact their rights.

To support European startups in creating their own AI models, the AI Act also promotes regulatory sandboxes and real-world-testing. These will be set up by national authorities to allow companies to develop and train their AI technologies before they’re introduced to the market “without undue pressure from industry giants controlling the value chain.”

What about ChatGPT and generative AI models?

Providers of general-purpose AI systems must meet certain transparency requirements under the AI Act; this includes creating technical documentation, complying with European copyright laws and providing detailed information about the data used to train AI foundation models. The rule applies to models used for generative AI systems like OpenAI’s ChatGPT.

SEE: Generative AI: UK Business Leaders Face Investment Challenges as Everyone Claims to Be an Expert (TechRepublic)

What are the penalties for breaching the AI Act?

Companies that fail to comply with the legislation face fines ranging from €35 million ($38 million USD) or 7% of global turnover to €7.5 million ($8.1 million USD) or 1.5% of turnover, depending on the infringement and size of the company.

How significant is the AI Act?

Symbolically, the AI Act represents a pivotal moment for the AI industry. Despite its explosive growth in recent years, AI technology remains largely unregulated, leaving policymakers struggling to keep up with the pace of innovation.

The EU hopes that its AI rulebook will set a precedent for other countries to follow. Posting on X (formerly Twitter), European Commissioner Thierry Breton labelled the AI Act “a launchpad for EU startups and researchers to lead the global AI race,” while Dragos Tudorache, MEP and member of the Renew Europe Group, said the legislation would strengthen Europe’s ability to “innovate and lead in the field of AI” while protecting citizens.

What have been some challenges associated with the AI Act?

The AI Act has been beset by delays that have eroded the EU’s position as a frontrunner in establishing comprehensive AI regulations. Most notable has been the arrival and subsequent meteoric rise of ChatGPT late last year, which had not been factored into plans when the EU first set out its intention to regulate AI in Europe in April 2021.

As reported by Euractiv, this threw negotiations into disarray, with some countries expressing reluctance to include rules for foundation models on the basis that doing so could stymie innovation in Europe’s startup scene. In the meantime, the U.S., U.K. and G7 countries have all taken strides towards publishing AI guidelines.

SEE: UK AI Safety Summit: Global Powers Make ‘Landmark’ Pledge to AI Safety (TechRepublic)

What are critics saying about the AI Act?

Some privacy and human rights groups have argued that these AI regulations don’t go far enough, accusing the EU lawmakers of delivering a watered-down version of what they originally promised.

Privacy rights group European Digital Rights labelled the AI Act a “high-level compromise” on “one of the most controversial digital legislations in EU history,” and suggested that gaps in the legislation threatened to undermine the rights of citizens.

The group was particularly critical of the Act’s limited ban on facial recognition and predictive policing, arguing that broad loopholes, unclear definitions and exemptions for certain authorities left AI systems open to potential misuse in surveillance and law enforcement.

Ella Jakubowska, senior policy advisor at European Digital Rights, said in a statement:
“It’s hard to be excited about a law which has, for the first time in the EU, taken steps to legalise live public facial recognition across the bloc. Whilst the Parliament fought hard to limit the damage, the overall package on biometric surveillance and profiling is at best lukewarm. Our fight against biometric mass surveillance is set to continue.”

Amnesty International was also critical of the limited ban on AI facial recognition, saying it set “a devastating global precedent.”

Mher Hakobyan, advocacy advisor on artificial intelligence at Amnesty International, said in a statement: “The three European institutions – Commission, Council and the Parliament – in effect greenlighted dystopian digital surveillance in the 27 EU Member States, setting a devastating precedent globally concerning artificial intelligence (AI) regulation.

“Not ensuring a full ban on facial recognition is therefore a hugely missed opportunity to stop and prevent colossal damage to human rights, civic space and rule of law that are already under threat throughout the EU.”

What’s next with the AI Act?

The AI Act is now pending formal adoption by both the European Parliament and the Council in order to be enacted as European Union legislation. The agreement will be subject to a vote in an upcoming meeting of the Parliament’s Internal Market and Civil Liberties committees.

How to use DALL-E 3 in ChatGPT

DALL-E 3 in ChatGPT

OpenAI recently added DALL-E 3, its most powerful version of an artificial intelligence image generator to date, to ChatGPT Plus and Enterprise subscriptions. Since then, paid ChatGPT subscribers can access DALL-E 3 within the AI chatbot, as it's yet to become available through OpenAI Labs.

Also: How to use ChatGPT Plus: From web browsing to plugins

DALL-E 3 is also available through the Bing Image Creator, which gives users the chance to access the AI image generator for free through a Microsoft account.

How to create AI images using DALL-E 3 in ChatGPT

I like to ask AI to generate images that cannot be real, but it can generate pretty much any images you'd like, as long as they don't violate the generator's terms of use.

What you'll need: As stated above, using DALL-E 3 in ChatGPT requires a Plus or Enterprise because DALL-E 3 is only available within GPT-4. A Plus subscription costs $20 per month, while the cost for the Enterprise tier varies depending on the size of the organization. You can learn more about subscribing to ChatGPT Plus or read on if you're already a subscriber.

Log in or sign up to ChatGPT.

Select GPT-4.

I used the prompt, "a photo of a blue alligator driving a spaceship with planet Earth in the background."

Click the arrow on the top right to download the image.

FAQs

Can I access DALL-E 3 without a ChatGPT Plus subscription?

DALL-E 3 isn't available in OpenAI Labs like DALL-E 2 has been for over a year. But the company advised it would be adding the latest version to Labs this fall.

For now, you can use DALL-E 3 for free in the Bing Image Generator. Just log in or create a Microsoft account and start creating pictures.

Does DALL-E 3 have a limit?

Within ChatGPT Plus, DALL-E 3 has the same limits as GPT-4, which is 40 messages in three hours.

The Bing Image Creator doesn't limit DALL-E 3, but it gives you 'Boosts,' which are tokens that let you create images faster. Each account starts with about 100 boosts. After these boosts run out, your images will take longer to generate.

Disclaimer: You should consider the legal consequences (e.g. copyright) of using AI-generated images before implementing them into your work.

See also

Concept Sliders: Precise Control in Diffusion Models with LoRA Adaptors

Concept Sliders: Precise Control in Diffusion Models with LoRA Adaptors

Thanks to their capabilities, text-to-image diffusion models have become immensely popular in the artistic community. However, current models, including state-of-the-art frameworks, often struggle to maintain control over the visual concepts and attributes in the generated images, leading to unsatisfactory outputs. Most models rely solely on text prompts, which poses challenges in modulating continuous attributes like the intensity of weather, sharpness of shadows, facial expressions, or age of a person precisely. This makes it difficult for end-users to adjust images to meet their specific needs. Furthermore, although these generative frameworks produce high-quality and realistic images, they are prone to distortions like warped faces or missing fingers.

To overcome these limitations, developers have proposed the use of interpretable Concept Sliders. These sliders promise greater control for end-users over visual attributes, enhancing image generation and editing within diffusion models. Concept Sliders in diffusion models work by identifying a parameter direction corresponding to an individual concept while minimizing interference with other attributes. The framework creates these sliders using sample images or a set of prompts, thus establishing directions for both textual and visual concepts.

Ultimately, the use of Concept Sliders in text to image diffusion models can result in image generation with minimal degree of interference, and enhanced control over the final output while also increasing the perceived realism without altering the content of the images, and thus generating realistic images. In this article, we will be discussing the concept of using Concept Sliders in text to image frameworks in greater depth, and analyze how its use can result in superior quality AI generated images.

An Introduction to Concept Sliders

As previously mentioned, current text-to-image diffusion frameworks often struggle to control visual concepts and attributes in generated images, leading to unsatisfactory results. Moreover, many of these models find it challenging to modulate continuous attributes, further contributing to unsatisfactory outputs. Concept Sliders may help mitigate these issues, empowering content creators and end-users with enhanced control over the image generation process and addressing challenges faced by current frameworks.

Most current text-to-image diffusion models rely on direct text prompt modification to control image attributes. While this approach allows image generation, it is not optimal as changing the prompt can drastically alter the image's structure. Another approach used by these frameworks involves Post-hoc techniques, which invert the diffusion process and modify cross-attentions to edit visual concepts. However, Post-hoc techniques have limitations, supporting only a limited number of simultaneous edits and requiring individual interference passes for each new concept. Additionally, they can introduce conceptual entanglement if not engineered carefully.

In contrast, Concept Sliders offer a more efficient solution for image generation. These lightweight, easy-to-use adaptors can be applied to pre-trained models, enhancing control and precision over desired concepts in a single interference pass with minimal entanglement. Concept Sliders also enable the editing of visual concepts not covered by textual descriptions, a feature distinguishing them from text-prompt-based editing methods. While image-based customization methods can effectively add tokens for image-based concepts, they are difficult to implement for editing images. Concept Sliders, on the other hand, allow end-users to provide a small number of paired images defining a desired concept. The sliders then generalize this concept and automatically apply it to other images, aiming to enhance realism and fix distortions such as in hands.

Concept Sliders strive to learn from and address issues common to four generative AI and diffusion framework concepts: Image Editing, Guidance-based Methods, Model Editing, and Semantic Directions.

Image Editing

Current AI frameworks either focus on using a conditional input to guide the image structure, or they manipulate cross-attentions of source image with its target prompt to enable single image editing in text to image diffusion frameworks. Resultantly, these approaches can be implemented only on single images and they also require latent basis optimization for every image as a result of evolving geometric structure over timesteps across prompts.

Guidance-based Methods

The use of classifier-free guidance based methods have indicated their ability to enhance the quality of the generated images, and boost text-image alignment. By incorporating guidance terms during interference, the method improves the limited compositionality inherited by the diffusion frameworks, and they can be used to guide through unsafe concepts in diffusion frameworks.

Model Editing

The use of Concept Sliders can also be seen as a model editing technique that employs a low-rank adaptor to output a single semantic attribute that makes room for continuous control that aligns with the attribute. Fine-tuning-based customization methods are then used to personalize the framework to add new concepts. Furthermore, the Custom Diffusion technique proposes a way to finetune cross-attention layers to incorporate new visual concepts into pre-trained diffusion models. Conversely, the Textual Diffusion technique proposes to optimize an embedding vector to activate model capabilities and introduce textual concepts into the framework.

Semantic Direction in GANs

Manipulation of semantic attributes is one of the key attributes of Generative Adversarial Networks with the latent space trajectories found to be aligned in a self-supervised manner. In diffusion frameworks, these latent space trajectories exist in the middle layers of the U-Net architecture, and the principal direction of latent spaces in diffusion frameworks captures global semantics. Concept Sliders train low-rank subspaces corresponding to special attributes directly, and obtains precise and localized editing directions by using text or image pairs to optimize global directions.

Concept Sliders : Architecture, and Working

Diffusion Models and LoRA or Low Rank Adaptors

Diffusion models are essentially a subclass of generative AI frameworks that operate on the principle of synthesizing data by reversing a diffusion process. The forward diffusion process initially adds noise to the data, thus the transition from an organized state to a complete Gaussian noise state. The primary aim of diffusion models is to reverse the diffusion process by gradually denoising the image, and sampling a random Gaussian noise to generate an image. In real world applications, the primary objective of Diffusion frameworks is to predict the true noise when the complete Gaussian noise is fed as input with additional inputs like conditioning and timestep.

The LoRA or Low Rank Adaptors technique decomposes weight updates during fine-tuning to enable efficient adaption of large pre-trained frameworks on downstream tasks. The LoRA technique decomposes weight updates for a pre-trained model layer with respect to both the input and the output dimensions, and constrains the update to a low-dimensional subspace.

Concept Sliders

The primary aim of Concept Sliders is to serve as an approach to fine-tune LoRA adaptors on a diffusion framework to facilitate a greater degree of control over concept-targeted images, and the same is demonstrated in the following image.

When conditioned on target concepts, Concept Sliders learn low-rank parameter directions to either increase or decrease the expression of specific attributes. For a model and its target concept, the primary goal of Concept Sliders is to obtain an enhanced model that modifies the likelihood of enhancing and suppressing attributes for an image when conditioned on the target concept to increase the likelihood of enhancing attributes, and decrease the likelihood of suppressing attributes. Using reparameterization and Tweedie’s formula, the framework introduces a time-varying noise process, and expresses each score as a denoising prediction. Furthermore, the disentanglement objective finetunes the modules in Concept Sliders while keeping the pre-trained weights constant, and the scaling factor introduced during the LoRA formulation is modified during interference. The scaling factor also facilitates adjusting the strengths of the edit, and makes the edits stronger without retraining the framework as demonstrated in the following image.

Editing methods used earlier by frameworks facilitated stronger edits by retraining the framework with increased guidance. However, scaling the scaling factor during interference produces the same editing results without increasing the retraining cost, and time.

Learning Visual Concepts

Concept Sliders are designed in a way to control visual concepts that text prompts are not able to define well, and these sliders leverage small datasets that are either paired before or after to train on these concepts. The contrast between the image pairs allows sliders to learn the visual concepts. Furthermore, the Concept Sliders’ training process optimizes the LoRA component implemented in both the forward and reverse directions. As a result, the LoRA component aligns with the direction that causes the visual effects in both the directions.

Concept Sliders : Implementation Results

To analyze the gain in performance, developers have evaluated the use of Concept Sliders primarily on the Stable Diffusion XL, a high-resolution 1024-pixel framework with additional experiments conducted on the Stable Diffusion v1.4 framework with the models being trained for 500 epochs each.

Textual Concept Sliders

To evaluate the performance of textual Concept Sliders, it is validated on a set of 30 text-based concepts, and the method is compared against two baseline that make use of a standard text prompt for a fixed number of timesteps, and then starts composition by adding prompts to steer the image. As it can be seen in the following figure, the use of Concept Sliders results in constantly higher CLIP score, and a constant reduction in the LPIPS score when compared to the original framework without Concept Sliders.

As it can be seen in the above picture, the use of Concept Sliders facilitate precise editing of the attributes desired during the image generation process while maintaining the overall structure of the image.

Visual Concept Sliders

Text to image diffusion models that make use only of text prompts often find it difficult to maintain a higher degree of control over visual attributes like facial hair, or eye shapes. To ensure better control over granular attributes, Concept Sliders leverage optional text guidance paired with image datasets. As it can be seen in the figure below, Concept Sliders create individual sliders for “eye size” and “eyebrow shape” that capture the desired transformations using the image pairs.

The results can be further refined by providing specific texts so that the direction focuses on that facial region, and creates sliders with stepwise control over the targeted attribute.

Composing Sliders

One of the major advantages of using Concept Sliders is its composability that allows users to combine multiple sliders for an enhanced amount of control rather than focusing on a single concept at a time which can be owed to the low-rank sliders directions used in Concept Sliders. Additionally, since Concept Sliders are lightweight LoRA adaptors, they are easy to share, and they can also be easily overlaid on diffusion models. Users can also adjust multiple knobs simultaneously to steer complex generations by downloading interesting slider sets.

The following image demonstrates the composition capabilities of concept sliders, and multiple sliders are composed progressively in each row from left to right, thus allowing traversal of high-dimensional concept spaces with an enhanced degree of control over the concepts.

Improving Image Quality

Although state of the art text to image diffusion frameworks & large-scale generative models like Stable Diffusion XL model are capable of generating realistic and high-quality images, they often suffer from image distortions like blurry or wrapped objects even though the parameters of these state of the art frameworks are equipped with the latent capability to generate high-quality output with fewer generations. The use of Concept Sliders can result in generating images with fewer distortions by unlocking the true capabilities of these models by identifying low-rank parameter directions.

Fixing Hands

Generating images with realistic-looking hands has always been a hurdle for diffusion frameworks, and the use of Concept Sliders has the directly control the tendency to distort hands. The following image demonstrates the effect of using the “fix hands” Concept Sliders that allows the framework to generate images with more realistically looking hands.

Repair Sliders

The use of Concept Sliders can not only result in generating more realistically looking hands, but they have also shown their potential in improving the overall realism of the images generated by the framework. Concept Sliders also identifies single low-rank parameter direction that enables the shift in images from common distortion issues, and the results are demonstrated in the following image.

Final Thoughts

In this article, we have talked about Concept Sliders, a simple yet scalable new paradigm that enables interpretable control over generated output in diffusion models. The use of Concept Sliders aims to resolve the issues faced by the current text to image diffusion frameworks that find it difficult to maintain the required control over visual concepts and attributes included in the generated image which often leads to unsatisfactory output. Furthermore, a majority of text to image diffusion models find it difficult to modulate continuous attributes in an image that ultimately often leads to unsatisfactory outputs. The use of Concept Sliders might allow text to image diffusion frameworks to mitigate these issues, and empower content creators & end users with an enhanced degree of control over the image generation process, and solve issues faced by current frameworks.