The Alignment
Posts
Reddit wants AI Companies to Payup

Reddit wants AI Companies to Payup

Plus Stabilitiy AI releases ChatGPT like model and AI loves the Pope

The Alignment
April 19, 2023

Welcome to The Alignment ! Everyone wants to be part of the AI Gold Rush. For the first time in a long time there is opportunity for every single technology stakeholder to benefit from AI and hence everyone wants a slice of the AI pie.

As always if you enjoy reading our posts be sure to spread the word !

Here’s what we have lined up for you today -

Reddit wants AI Companies to pay up
Stability AI announces new open-source large language model
AI loves the Pope
Nvidia is getting into video creation ?

Reddit wants AI Companies to pay up

Open AI has been training its models on the entirety of Reddit’s data for quite some time. Google’s Bard and Microsoft’s Bing have also been trained on the social network’s data.

But as they say, there is no free lunch. Reddit has announced API changes that will charge these companies when they export Reddit’s data. Now, data can usually be scraped off a website but it’s often unstructured and requires a lot of post processing. The Reddit API which has been freely open since 2008 makes it easier for developers to directly find and package the relevant data.

Commercial usage to train for LLMs will now require companies to get into an agreement with Reddit. This is the rationale move to make for a for profit company like Reddit. The business model of selling well structured extensive data has been around forever. Now companies will start applying this for data that is purpose built to train and develop LLMs.

Twitter also pulled access to their commercial API as well (also used to train LLM models amongst other things) not too long ago.

Reddit CEO Steve Huffman who is preparing the company for an IPO had this to say about the company’s decision - “Crawling Reddit, generating value and not returning any of that value to our users is something we have a problem with. It’s a good time for us to tighten things up.”

Stability AI announces new open-source large language model

Stability has been in the press alot lately, from their famed stable diffusion release to their potential fundraise at a $4 billion valuation and more recently questions about their business model.

Today, the company has released its family of open sourced large language models called StableLM. Much like ChatGPT, the model is purpose built for text and code generation. The model uses a larger version of an open source data set called Pile which contains information from Wikipedia, Stack Exchange and PubMed. The model sizes vary from 3 billion to 7 billion parameters with 15 to 65 billion parameter models also in the pipeline.

A demo of StableLM can be tried on Hugging Face.

A few things to note with this release of StableLM :

How big is big ? OpenAI and Anthropic have very large models. Models with a few billion parameters seem like dwarfs compared to their models which have at least hundreds of billions of parameters.
Smaller models, equal performance ? StableLM has 7 billion parameters, Databricks’ Dolly model has 6 billion parameters. They claim to compete well with GPT 3.5 on benchmarks.
Models as commodities ? Every company is coming out with their family of LLMs. Meta, Google, OpenAI, Databricks, Stability AI, Stanford all have models that more or less are intended for the same purpose. How are they going to differentiate ?
Warchest to train models. Sam Altman confirmed that training their models definitely cost more than $100 million and Anthropic believes it needs to raise $5 billion within 4 years. How are upstarts going to compete ? Stability says its competition is with the big dogs, but it’s going to be needing a lot more capital.
How much can we push current models ? Sam Altman recently said at an MIT talk “I think we’re at the end of the era where it’s going to be these, like, giant, giant models. We’ll make them better in other ways.” He believes further improvements will be made using new architectures.

Around the industry -

Google’s Rush to Win in AI Led to Ethical Lapses, Employees Say
Salesforce is working on a pair of new generative AI-driven workflow tools
Cortical Labs raises $10M for its Pong-playing stem cells which eventually could power AI
Chegg announces CheggMate, the new AI companion built for Students
Michael Schumacher's family planning legal action over AI 'interview'
AI companies ask US court to dismiss artists' copyright lawsuit
Snapchat’s AI chatbot is now free for all global users

Product Corner -

There are way too many cool products to showcase just one !

Klu: Find the right information. Instantly across all your apps.
Aomni : Aomni is an information retrieval AI agent that is able to find, extract, and process any data for you on the internet
InfraCopliot : The Intelligent Infra-as-Code Editor
Bloks : Notes, tasks and meetings on autopilot
Reroom : Redesign your room
Cognify : Transform photos into stunning designs.

Generative AI loves the Pope

Nvidia’s getting into video creation ?

NVIDIA just released a very impressive text-to-video paper.
Video Latent Diffusion Models (Video LDMs) use a diffusion model in a compressed latent space to generate high-resolution videos.
Here's a brief overview of how it works:
1. Pre-train image LDM on a dataset of images.… twitter.com/i/web/status/1…
— Lior⚡ (@AlphaSignalAI)
3:13 PM • Apr 19, 2023