Abbey Road Red’s Monthly Round-up – The Fourth Quarter Edition

Abbey Road Red’s Monthly Round-up – The Fourth Quarter Edition

8th December 2022

What are the music tech stories you should know about from the last quarter?

Here’s the sixth instalment of a new series sourced by our innovation arm Abbey Road Red and their research across the music technology and start-up ecosystem. Each month, Abbey Road Red's Karim Fanous and Anthony Achille will present a short overview of their top stories — view the selection from the last three months!

Generative AI and Large Language Models

We've been keenly exploring the meeting of AI and music since we launched our music tech incubator, Abbey Road Red, in 2016. Our second cohort included AImusic which was acquired by Apple this year. Many of our alumni use artificial intelligence and machine learning. Every one of them uses algorithms for at least a minor degree of process automation. We've published prior blog pieces, linked below, and held a vigorously debated Red Talk on the subject well ahead of time.

A plethora of recent developments, particularly in text-to-visual AI tools, as well as a sense that AI has made it into the everyday creator toolset has led us to take a fresh look.

Generative AI is a label that's used to describe artificial intelligence that uses unsupervised learning algorithms. It exists within a field of machine learning that uses neural networks which are models inspired by the structure of the brain. A sub section of generative AI is 'text-to' models which can create new digital images, video or audio based off of text prompts.


What are Large Language Models?

Text-to-image models generally combine a language model, which transforms the input text into a latent representation, which produces image, video or audio conditioned on that representation.

Large Language Models (LLMs) are AI tools that can read, summarise and translate texts to then generate them in a similar way to how humans write and talk. Natural Language Processing (NLP) is a method used to train LLMs on how humans write and talk. LLM and text-to models have been launched at leading tech firms including Google (BERT and LaMDA), MetaAI (OPT-175B), and OpenAI (GPT-3 for text, DALLE-2 for image).

Not when AI, but how AI?

In terms of mainstream adoption of creative tools underpinned by AI processes, it used to be a question of 'when', but while 'when' feels like now, we think it's a more pragmatic question of 'how'. This tectonic shift in creative toolsets can help human creativity in the following ways: freeing up time spent on repetitive tasks to focus on innovation, widening the accessibility and increasing democratisation of complex tools and creative processes that were previously out of the budget of the majority of people, a new generation of story tellers will surface as new types of creator-led art, stories, music, film are made with the help of assistive AI.

The 'how' can be further hinted at in the swell of recent launch and funding milestones, the tools and products which already exist and those which will evolve out of them.

Recent Milestones

September 2022

Meta Unveils Make-A-Video - Meta unveiled a text-to-video creative AI software called Make-A-Video.

Google unveils Imagen Video - Less than a week after Meta’s Make-A-Video announcement, Google detailed its work on Imagen Video, another AI text-to-video generator.

October 2022

Stability AI, the startup behind Stable Diffusion, raises $101M - Stability AI, the company funding the development of open source music- and image-generating systems like Dance Diffusion and Stable Diffusion, announced that it raised $101 million.

AI content generation Unicorn Jasper raises $125M at a $1.5B valuation – Jasper leverages AI to generate content for blog articles, social media posts, website copy. Jasper’s differentiator, is Jasper Art an AI art-generating system which is competing with AI text to image generators such as OpenAI & Midjourney. This happened the day after Stability AI raised $101 million at a $1B valuation, two unicorn events in the space.

November 2022

DALLE-2 API Becomes Available in Public Beta - After DALLE-2's pivotal launch in April 2022, the API was publicly released whereby developers can integrate DALL·E directly into their apps and products through their API. Developers can start building with this same technology in a matter of minutes.

OpenAI launches Converge - OpenAI announced a Startup Accelerator that will provide early-stage AI startups with capital and access to OpenAI'S tech and resources.

OpenAI releases ChatGPT - ChatGPT interacts with the user in a conversational way. The dialogue format makes it possible for ChatGPT to answer follow up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.

Three Red Takeaways

This isn't about replacing creativity, it's about shifting where it happens. The creative element with ‘text-to’ technologies is in crafting your prompt. This approach also throws up unexpected results in a creative way, putting the creator in a ‘happy creative accident’ zone which could help inspire different kinds of creativity/results. A new kind of creative approach.

Creativity and tool UI's will become text or voice-based, interactive and conversational. Creative tool interfaces will change as AI tools are adopted and become more interactive and conversational, either via text input or voice control. They may even provide feedback or tweak suggestions based on your initial commands/outputs.

It will empower creativity around the creative content. Creating generative visuals with ‘text-to’ prompts may mitigate bandwidth load and cognitive overhead for both marketeers and emerging artists in that they can utilise the technology to create an array of creative assets including album artwork, blogs or promotional marketing content, visualiser/lyric/music videos and more, with small spends.

Guess The Image

Here are some text-to-image generations created by Red using OpenAI's DALL-E 2

Heard of DALL-E? It has taken pop culture by AI storm. We thought we'd include a few examples of its output below. The prompt we used was ‘schematics for a futuristic x musical instrument’ and then tweaked the variable to either: stringed, percussion, electronic or wind.

Can you guess which is which?

Related News