Simon Willison’s Weblog

On generativeai 317 llms 283 python 873 datasette 367 ethics 57 ...

 

Recent entries

Exploring GPTs: ChatGPT in a trench coat? one day ago

The biggest announcement from last week’s OpenAI DevDay (and there were a LOT of announcements) was GPTs. Users of ChatGPT Plus can now create their own, custom GPT chat bots that other Plus subscribers can then talk to.

My initial impression of GPTs was that they’re not much more than ChatGPT in a trench coat—a fancy wrapper for standard GPT-4 with some pre-baked prompts.

Now that I’ve spent more time with them I’m beginning to see glimpses of something more than that. The combination of features they provide can add up to some very interesting results.

As with pretty much everything coming out of these modern AI companies, the documentation is thin. Here’s what I’ve figured out so far.

Financial sustainability for open source projects at GitHub Universe six days ago

I presented a ten minute segment at GitHub Universe on Wednesday, ambitiously titled Financial sustainability for open source projects.

GitHub invited me to speak as a representative of the GitHub Accelerator program from earlier this year. The goal was to share some of the advice from that program, and talk about my own personal experiences trying to achieve financial sustainability for my Datasette open source project.

To set expectations: Datasette is not yet financially sustainable, at least not in terms of my long-term goals for the project! Fitting everything I’ve explored so far into just ten minutes was a significant challenge.

You can watch my presentation on YouTube, or embedded below. Read on for an annotated version of the slides, based on a Whisper transcript and extended with some extra clarity and links to further reading.

I closed with a call to action for a novel way that companies can help support open source projects: pay maintainers to speak to your team, in the form of time-boxed one hour Zoom consulting calls. Open source developers are often bad at asking for money. If you want to support a project, try pushing money towards them from your existing training budget instead!

ospeak: a CLI tool for speaking text in the terminal via OpenAI 10 days ago

I attended OpenAI DevDay today, the first OpenAI developer conference. It was a lot. They released a bewildering array of new API tools, which I’m just beginning to wade my way through fully understanding.

My preferred way to understand a new API is to build something with it, and in my experience the easiest and fastest things to build are usually CLI utilities.

I’ve been enjoying the new ChatGPT voice interface a lot, so I was delighted to see that OpenAI today released a text-to-speech API that uses the same model.

My first new tool is ospeak, a CLI utility for piping text through that API.

ospeak

You can install ospeak like this. I’ve only tested in on macOS, but it might well work on Linux and Windows as well:

pipx install ospeak

Since it uses the OpenAI API you’ll need an API key. You can either pass that directly to the tool:

ospeak "Hello there" --token="sk-..."

Or you can set it as an environment variable so you don’t have to enter it multiple times:

export OPENAI_API_KEY=sk-...
ospeak "Hello there"

Now you can call it and your computer will speak whatever you pass to it!

ospeak "This is really quite a convincing voice"

OpenAI currently have six voices: alloy, echo, fable, onyx, nova and shimmer. The command defaults to alloy, but you can specify another voice by passing -v/--voice:

ospeak "This is a different voice" -v nova 

If you pass the special value -v all it will say the same thing in each voice, prefixing with the name of the voice:

ospeak "This is a demonstration of my voice." -v all

Here’s a recording of the output from that:

You can also set the speed—from 0.25 (four times slower than normal) to 4.0 (four times faster). I find 2x is fast but still understandable:

ospeak "This is a fast voice" --speed 2.0

Finally, you can save the output to a .mp3 or .wav file instead of speaking it through the speakers, using the -o/--output option:

ospeak "This is saved to a file" -o output.mp3

That’s pretty much all there is to it. There are a few more details in the README.

The source code was adapted from an example in OpenAI’s documentation.

The real fun is when you combine it with llm, to pipe output from a language model directly into the tool. Here’s how to have your computer give a passionate speech about why you should care about pelicans:

llm -m gpt-4-turbo \
  "A short passionate speech about why you should care about pelicans" \
  | ospeak -v nova

Here’s what that gave me (transcript here):

I thoroughly enjoy how using text-to-speech like this genuinely elevates an otherwise unexciting piece of output from an LLM. This speech engine really is very impressive.

LLM 0.12 for gpt-4-turbo

I upgraded LLM to support the newly released GPT 4.0 Turbo model—an impressive beast which is 1/3 the price of GPT-4 (technically 3x cheaper for input tokens and 2x cheaper for output) and supports a huge 128,000 tokens, up from 8,000 for regular GPT-4.

You can try that out like so:

pipx install llm
llm keys set openai
# Paste OpenAI API key here
llm -m gpt-4-turbo "Ten great names for a pet walrus"
# Or a shortcut:
llm -m 4t "Ten great names for a pet walrus"

Here’s a one-liner that summarizes the Hacker News discussion about today’s OpenAI announcements using the new model (and taking advantage of its much longer token limit):

curl -s "https://hn.algolia.com/api/v1/items/38166420" | \
  jq -r 'recurse(.children[]) | .author + ": " + .text' | \
  llm -m gpt-4-turbo 'Summarize the themes of the opinions expressed here,
  including direct quotes in quote markers (with author attribution) for each theme.
  Fix HTML entities. Output markdown. Go long.'

Example output here. I adapted that from my Claude 2 version, but I found I had to adjust the prompt a bit to get GPT-4 Turbo to output quotes in the manner I wanted.

I also added support for a new -o seed 1 option for the OpenAI models, which passes a seed integer that more-or-less results in reproducible outputs—another new feature announced today.

So much more to explore

I’ve honestly hardly even begun to dig into the things that were released today. A few of the other highlights:

  • GPT-4 vision! You can now pass images to the GPT-4 API, in the same way as ChatGPT has supported for the past few weeks. I have so many things I want to build on top of this.
  • JSON mode: both 3.5 and 4.0 turbo can now reliably produce valid JSON output. Previously they could produce JSON but would occasionally make mistakes—this mode makes mistakes impossible by altering the token stream as it is being produced (similar to Llama.cpp grammars).
  • Function calling got some big upgrades, the most important of which is that you can now be asked by the API to execute multiple functions in parallel.
  • Assistants. This is the big one. You can now define custom GPTs (effectively a custom system prompt, set of function calls and collection of documents for use with Retrieval Augmented Generation) using the ChatGPT interface or via the API, then share those with other people.... or use them directly via the API. This makes building simple RAG systems trivial, and you can also enable both Code Interpreter and Bing Browse mode as part of your new assistant. It’s a huge recipe for prompt injection, but it also cuts out a lot of the work involved in building a custom chatbot.

Honestly today was pretty overwhelming. I think it’s going to take us all months to fully understand the new capabilities we have around the OpenAI family of models.

It also feels like a whole bunch of my potential future side projects just dropped from several weeks of work to several hours.

DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema 17 days ago

I wrote a lot this week. I also did some fun research into new options for self-hosting vector maps and pushed out several new releases of plugins.

On the blog

PMTiles and MapLibre GL

I saw a post about Protomaps on Hacker News. It’s absolutely fantastic technology.

The Protomaps PMTiles file format lets you bundle together vector tiles in a single file which is designed to be queried using HTTP range header requests.

This means you can drop a single 107GB file on cloud hosting and use it to efficiently serve vector maps to clients, fetching just the data they need for the current map area.

Even better than that, you can create your own subset of the larger map covering just the area you care about.

I tried this out against my hometown of Half Moon Bay ond get a building-outline-level vector map for the whole town in just a 2MB file!

You can see the result (which also includes business listing markers from Overture maps) at simonw.github.io/hmb-map.

A vector map of El Granada showing the area around the harbor, with lots of little markers for different businesses. Protomaps (c) OpenStreetMap in the corner.

Lots more details of how I built this, including using Vite as a build tool and the MapLibre GL JavaScript library to serve the map, in my TIL Serving a custom vector web map using PMTiles and maplibre-gl.

I’m so excited about this: we now have the ability to entirely self-host vector maps of any location in the world, using openly licensed data, without depending on anything other than our own static file hosting web server.

llm-gpt4all

This was a tiny release—literally a one line code change—with a huge potential impact.

Nomic AI’s GPT4All is a really cool project. They describe their focus as “a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.”—they’ve taken llama.cpp (and other libraries) and wrapped them in a much nicer experience, complete with Windows, macOS and Ubuntu installers.

Under the hood it’s mostly Python, and Nomic have done a fantastic job releasing that Python core as an installable Python package—meaning you can literally pip install gpt4all to get almost everything you need to run a local language model!

Unlike alternative Python libraries MLC and llama-cpp-python, Nomic have done the work to publish compiled binary wheels to PyPI... which means pip install gpt4all works without needing a compiler toolchain or any extra steps!

My LLM tool has had a llm-gpt4all plugin since I first added alternative model backends via plugins in July. Unfortunately, it spat out weird debugging information that I had been unable to hide (a problem that still affects llm-llama-cpp).

Nomic have fixed this!

As a result, llm-gpt4all is now my recommended plugin for getting started running local LLMs:

pipx install llm
llm install llm-gpt4all
llm -m mistral-7b-instruct-v0 "ten facts about pelicans"

The latest plugin can also now use the GPU on macOS, a key feature of Nomic’s big release in September.

sqlite-migrate

sqlite-migrate is my plugin that adds a simple migration system to sqlite-utils, for applying changes to a database schema in a controlled, repeatable way.

Alex Garcia spotted a bug in the way it handled multiple migration sets with overlapping migration names, which is now fixed in sqlite-migrate 0.1b0.

Ironically the fix involved changing the schema of the _sqlite_migrations table used to track which migrations have been applied... which is the one part of the system that isn’t itself managed by its own migration system! I had to implement a conditional check instead that checks if the table needs to be updated.

A recent thread about SQLite on Hacker News included a surprising number of complaints about the difficulty of running migrations, due to the lack of features of the core ALTER TABLE implementation.

The combination sqlite-migrate and the table.transform() method in sqlite-utils offers a pretty robust solution to this problem. Clearly I need to put more work into promoting it!

Homebrew trouble for LLM

I started getting confusing bug reports for my various LLM projects, all of which boiled down to a failure to install plugins that depended on PyTorch.

It turns out the LLM package for Homebrew upgraded to Python 3.12 last week... but PyTorch isn’t yet available for Python 3.12.

This means that while base LLM installed from Homebrew works fine, attempts to install things like my new llm-embed-jina plugin fail with weird errors.

I’m not sure the best way to address this. For the moment I’ve removed the recommendation to install using Homebrew and replaced it with pipx in a few places. I have an open issue to find a better solution for this.

The difficulty of debugging this issue prompted me to ship a new plugin that I’ve been contemplating for a while: llm-python.

Installing this plugin adds a new llm python command, which runs a Python interpreter in same virtual environment as LLM—useful for if you installed LLM via pipx or Homebrew and don’t know where that virtual environment is located.

It’s great for debugging: I can ask people to run llm python -c 'import sys; print(sys.path)' for example to figure out what their Python path looks like.

It’s also promising as a tool for future tutorials about the LLM Python library. I can tell people to pipx install llm and then run llm python to get a Python interpreter with the library already installed, without them having to mess around with virtual environments directly.

Add and remove indexes in datasette-edit-schema

We’re iterating on Datasette Cloud based on feedback from people using the preview. One request was the ability to add and remove indexes from larger tables, to help speed up faceting.

datasette-edit-schema 0.7 adds that feature.

That plugin includes this script for automatically updating the screenshot in the README using shot-scraper. Here’s the latest result:

Screenshot of the edit schema UI - you can rename a table, change existing columns, add a column, update foreign key relationships, change the primary key, delete the table and now edit the table indexes.

Releases this week

TIL this week

Now add a walrus: Prompt engineering in DALL-E 3 21 days ago

Last year I wrote about my initial experiments with DALL-E 2, OpenAI’s image generation model. I’ve been having an absurd amount of fun playing with its sequel, DALL-E 3 recently. Here are some notes, including a peek under the hood and some notes on the leaked system prompt.

DALL-E 3 is available via ChatGPT Plus. For $20/month you get GPT-4 access, DALL-E 3, GPT Vision, Code Interpreter (aka Coding Intern), ChatGPT Plugins and the new, weird and slightly dystopian voice chat feature in the ChatGPT mobile apps. It’s absolutely worth the money, for the combined entertainment value alone.

A fascinating thing about DALL-E 3 is that you don’t prompt it directly: you instead access it via ChatGPT, which prompts the model for you.

ChatGPT then turns your query into several prompts, each producing a different image. The prompts it generates are available if you click on the images (on mobile you have to flip your phone horizontal to see that option).

I started by typing this into ChatGPT with DALL-E enabled:

A super posh pelican with a monocle watching the Monaco F1

It gave me these four images. I’ve included the prompt it used for each one:

Pelican 1
Photo of a refined pelican wearing a shiny gold monocle, perched on a railing with the Monaco Grand Prix F1 race in the background. The race cars blur past on the circuit, while yachts are anchored in the nearby harbor.
Pelican 2
Illustration of a sophisticated pelican donning a crystal clear monocle, observing the Monaco F1 race with keen interest. The race track is filled with speeding F1 cars, and the Mediterranean Sea sparkles in the distance with luxurious yachts.
Pelican 3
Watercolor painting of a posh pelican, its feathers groomed to perfection, equipped with a delicate monocle. It stands tall with the iconic Monaco F1 race happening behind it, cars racing at breakneck speeds and the shimmering sea beyond.
Pelican 4
Vector image of an elegant pelican with a classy monocle, standing gracefully against the backdrop of the Monaco Grand Prix. The F1 cars create a colorful streak as they race, and lavish yachts float in the nearby waters.

Note how the prompts it generates cover four different styles—photo, illustration, watercolor painting and vector image. They also use pleasing and varied language to describe different aspects of the image.

ChatGPT is prompting DALL-E for us!

The really fun thing though is that you can ask it follow-ups.

I said:

More like the first one please

And got back:

Pelican 1
Photo of an aristocratic pelican adorned with a golden monocle, perched confidently with the Monaco Grand Prix F1 race unfolding behind it. The Formula 1 cars zip by on the circuit, while opulent yachts are moored in the nearby marina.
Pelican 2
Photo of a regal pelican with a polished monocle, standing prominently against the backdrop of the Monaco F1 race. The roar of the engines can almost be heard as the cars speed past, and the glinting sea is dotted with luxurious boats.
Pelican 3
Photo of a distinguished pelican sporting a gleaming monocle, observing the Monaco Grand Prix from a vantage point. Racing cars leave a trail of dust as they navigate the circuit, with the blue sea and anchored yachts completing the panorama.
Pelican 4
Photo of a chic pelican, its eye magnified by a pristine monocle, captivated by the Monaco F1 race happening just meters away. The racetrack buzzes with activity as F1 cars race, and the horizon showcases the grandeur of the harbor with its yachts.

Next I added what is becoming my universal DALL-E 3 follow-up prompt, always guaranteed to produce better results:

Add a walrus

The result:

Pelican and a walrus (mostly off screen)
Photo of a posh pelican with a golden monocle and a distinguished walrus, both observing the Monaco Grand Prix F1 race. The pelican is perched on a railing while the walrus lounges nearby. Formula 1 cars race by on the circuit, with lavish yachts in the marina behind them.
Pelican and a walrus, excellent
Photo of an aristocratic pelican and a stately walrus side by side, watching the Monaco F1 with keen interest. The pelican has a monocle, and the walrus’s tusks shine brightly. The race track is alive with the sound of roaring engines, and the harbor is filled with luxurious boats.
Weird pelican walrus hybrid
Photo of a regal pelican, its eye magnified by a monocle, and a majestic walrus with gleaming tusks, both captivated by the Monaco Grand Prix. The racetrack is a blur of speed, and the horizon showcases the grandeur of the harbor and its anchored yachts.
Pelican and a walrus
Photo of a chic pelican with a pristine monocle and a sophisticated walrus, both engrossed in the Monaco F1 race. The pelican stands on a vantage point, and the walrus sits comfortably beside it. Racing cars zoom past, and the sea is dotted with opulent boats.

That second one is amazing—exactly what I want from a photo of a posh pelican and a walrus watching the Monaco Grand Prix.

I also enjoy how the third image ended up as a weird Pelican-Walrus hybrid, with tusks, a monocle and pelican feathers for the hair and whiskers.

Then @BrockMcBreadcat suggested:

Now make them muppets

Pelican and a walrus muppets 1
Photo of Muppet versions of a pelican and a walrus, both with exaggerated, comical features. The pelican Muppet wears a monocle and the walrus Muppet sports oversized tusks. They are animatedly discussing the Monaco Grand Prix F1 race, with caricatured F1 cars and yachts in the background.
Pelican and a walrus muppets 2
Illustration of Muppet-inspired pelican and walrus characters. The pelican has a large, beady eye accentuated by a monocle, while the walrus has a playful grin and prominent tusks. They are engrossed in the Monaco F1 race, with whimsical racing cars and yachts creating a lively backdrop.
Pelican and a walrus muppets 3
Photo of two Muppet characters: a pelican with a monocle and a walrus with big, goofy tusks. They’re seated in a Muppet-style commentary booth, providing humorous commentary on the Monaco Grand Prix. Cartoonish F1 cars race by, and colorful yachts are seen in the distance.
Pelican and a walrus muppets 4
Illustration of a pelican and walrus as Muppet characters. The pelican Muppet looks sophisticated with its monocle, and the walrus Muppet chuckles with its oversized tusks. They are enjoying the Monaco F1 race, with the track filled with Muppet-style racing cars and quirky yachts.

LOVE that third one!

Reusing seeds

I mentioned on Twitter that it was a shame you can’t reuse elements of an image for further visual consistency... and was pointed to a thread by Paul Couvert explaining how to do exactly that using seeds.

I didn’t believe this would work—my mental model of how random number seeds and image generation models work doesn’t fit what happened next.

I wrote:

What is the seed for image three?

ChatGPT replied:

The seed for the third image is 1379049893. If you have any other questions or requests, please let me know!

Then I said:

Add bow ties to both characters in seed 1379049893

A muppet pelican and a muppet walrus with bowties in front of the Monaco Grand Prix

Photo of two Muppet characters: a pelican with a monocle and a bow tie, and a walrus with big, goofy tusks and a dapper bow tie. They’re seated in a Muppet-style commentary booth, providing humorous commentary on the Monaco Grand Prix. Cartoonish F1 cars race by, and colorful yachts are seen in the distance.

I’m pretty stunned by this. It’s not an exact recreation—the boats in the background have changed slightly, and it added a top hat that I wasn’t expecting. But it’s pretty close!

People have been trying to figure out hacks to get Midjourney to create consistent characters for the past year, and DALL-E 3 apparently has that ability as an undocumented feature!

Peeking under the hood

Let’s see if we can find out a bit more about how this works.

I used the Firefox DevTools network inspector to grab the JSON representation of my conversation with ChatGPT. Here’s a pretty-printed copy.

At line 1087 you can see the point where I asked it to “Add bow ties to both characters in seed 1379049893”.

The next message in the transcript looks like this:

{
    "id": "5300d51c-794d-4d5f-bf75-6ea2dad4027c",
    "author": {
        "role": "assistant",
        "metadata": {}
    },
    "create_time": 1698340593.174454,
    "content": {
        "content_type": "text",
        "parts": [
            "{\n  \"prompts\": [\n    \"Photo of two Muppet characters: a pelican with a monocle and a bow tie, and a walrus with big, goofy tusks and a dapper bow tie. They're seated in a Muppet-style commentary booth, providing humorous commentary on the Monaco Grand Prix. Cartoonish F1 cars race by, and colorful yachts are seen in the distance.\"\n  ],\n  \"size\": \"1024x1024\",\n  \"seeds\": [1379049893]\n}"
        ]
    },
    "status": "finished_successfully",
    "end_turn": false,
    "weight": 1,
    "metadata": {
        "finish_details": {
            "type": "stop",
            "stop_tokens": [
                100265
            ]
        },
        "is_complete": true,
        "model_slug": "gpt-4-dalle",
        "parent_id": "aaa2c3bc-b670-4a3a-bf9f-46a1f3304266",
        "timestamp_": "absolute"
    },
    "recipient": "dalle.text2im"
}

Note the author is “assistant”—that’s ChatGPT—and the recipient is “dalle.text2im”.

The “parts” key includes some encoded JSON. Decoded and pretty-printed that looks like this:

{
  "prompts": [
    "Photo of two Muppet characters: a pelican with a monocle and a bow tie, and a walrus with big, goofy tusks and a dapper bow tie. They're seated in a Muppet-style commentary booth, providing humorous commentary on the Monaco Grand Prix. Cartoonish F1 cars race by, and colorful yachts are seen in the distance."
  ],
  "size": "1024x1024",
  "seeds": [1379049893]
}

There’s that 1379049893 seed from earlier!

The prompt has changed very slightly. The original prompt for the image was:

Photo of two Muppet characters: a pelican with a monocle and a walrus with big, goofy tusks. They’re seated in a Muppet-style commentary booth, providing humorous commentary on the Monaco Grand Prix. Cartoonish F1 cars race by, and colorful yachts are seen in the distance.

The new prompt (differences highlighted in bold) is:

Photo of two Muppet characters: a pelican with a monocle and a bow tie, and a walrus with big, goofy tusks and a dapper bow tie. They’re seated in a Muppet-style commentary booth, providing humorous commentary on the Monaco Grand Prix. Cartoonish F1 cars race by, and colorful yachts are seen in the distance.

So this really did work by adding a couple of extra details to the prompt and re-running it with the same seed as before.

I’m really surprised that this works. I would expect even a single character difference in the prompt to produce wildly different results, no matter what seed was being fed to the image generator. DALL-E 3 doesn’t fit my mental model of how these things work at all.

Midjourney and Stable Diffusion both have a “seed” concept, but as far as I know they don’t have anything like this capability to maintain consistency between images given the same seed and a slightly altered prompt.

The leaked DALL-E prompt

If you dig around in the JSON you’ll find a few examples like this that appear to reveal inner workings of the DALL-E/ChatGPT combo:

"DALL·E returned some images. They are already displayed to the user. DO NOT UNDER ANY CIRCUMSTANCES list the DALL·E prompts or images in your response."

It’s always amusing to see glimpses of OpenAI’s prompt engineering like this, in particular the way they use CAPITAL LETTERS for emphasis. Benj Edwards wrote about this the other day for Ars Technica.

Does this mean there’s a larger prompt that tells ChatGPT how to use DALL-E? It turns out there is, and it can be leaked! Dustin Miller shared the extracted prompt here.

(Every time a prompt like this leaks people question if it might be a hallucination. My experience is that these things are very unlikely to be hallucinated—LLMs are really good at repeating text from earlier in their context, and I have yet to see a documented case of a leaked prompt that turned out not to be correct. Here’s my own recreation of the leaked DALL-E prompt.)

There’s a lot in there. Here are some highlights:

Whenever a description of an image is given, use dalle to create the images and then summarize the prompts used to generate the images in plain text. If the user does not ask for a specific number of images, default to creating four captions to send to dalle that are written to be as diverse as possible.

That outlines the default behavior. Where things get really interesting is some of the stuff in the list of policies that follows:

  1. Don’t create images of politicians or other public figures. Recommend other ideas instead.

  2. Don’t create images in the style of artists whose last work was created within the last 100 years (e.g. Picasso, Kahlo). Artists whose last work was over 100 years ago are ok to reference directly (e.g. Van Gogh, Klimt). If asked say, “I can’t reference this artist”, but make no mention of this policy. Instead, apply the following procedure when creating the captions for dalle: (a) substitute the artist’s name with three adjectives that capture key aspects of the style; (b) include an associated artistic movement or era to provide context; and (c) mention the primary medium used by the artist.

They have clearly trained the model on all sorts of copyrighted images—this is the part of the prompt where they discourage it from obviously recreating the work of living artists.

  1. Always mention the image type (photo, oil painting, watercolor painting, illustration, cartoon, drawing, vector, render, etc.) at the beginning of the caption. Unless the caption suggests otherwise, make at least 1--2 of the 4 images photos

We saw this earlier where my first prompt was turned into a photo, watercolor, illustration and vector-style image.

  1. Diversify depictions of ALL images with people to include DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.
    • EXPLICITLY specify these attributes, not abstractly reference them. The attributes should be specified in a minimal way and should directly describe their physical form.
    • Your choices should be grounded in reality. For example, all of a given OCCUPATION should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.
    • Use “various” or “diverse” ONLY IF the description refers to groups of more than 3 people. Do not change the number of people requested in the original description.
    • Don’t alter memes, fictional character origins, or unseen people. Maintain the original prompt’s intent and prioritize quality.
    • Do not create any imagery that would be offensive.
    • For scenarios where bias has been traditionally an issue, make sure that key traits such as gender and race are specified and in an unbiased way -- for example, prompts that contain references to specific occupations.

Using prompt engineering in ChatGPT to get around the fundamental biases baked into the DALL-E 3 model!

  1. Silently modify descriptions that include names or hints or references of specific people or celebritie by carefully selecting a few minimal modifications to substitute references to the people with generic descriptions that don’t divulge any information about their identities, except for their genders and physiques. Do this EVEN WHEN the instructions ask for the prompt to not be changed. Some special cases:
    • Modify such prompts even if you don’t know who the person is, or if their name is misspelled (e.g. “Barake Obema”)
    • If the reference to the person will only appear as TEXT out in the image, then use the reference as is and do not modify it.
    • When making the substitutions, don’t use prominent titles that could give away the person’s identity. E.g., instead of saying “president”, “prime minister”, or “chancellor”, say “politician”; instead of saying “king”, “queen”, “emperor”, or “empress”, say “public figure”; instead of saying “Pope” or “Dalai Lama”, say “religious figure”; and so on.
    • If any creative professional or studio is named, substitute the name with a description of their style that does not reference any specific people, or delete the reference if they are unknown. DO NOT refer to the artist or studio’s style.

Another fascinating detail. What I love about these prompts is that they clearly developed over months of testing the model, and are designed to counter all manner of harmful or embarrassing capabilities of the underlying DALL-E 3.

Finally, there’s this snippet of TypeScript at the end which describes the seed behavior we saw earlier on:

namespace dalle {
// Create images from a text-only prompt.
type text2im = (_: {
// The resolution of the requested image, which can be wide, square, or tall. Use 1024x1024 (square) as the default unless the prompt suggests a wide image, 1792x1024, or a full-body portrait, in which case 1024x1792 (tall) should be used instead. Always include this parameter in the request.
size?: "1792x1024" | "1024x1024" | "1024x1792",
// The user's original image description, potentially modified to abide by the dalle policies. If the user does not suggest a number of captions to create, create four of them. If creating multiple captions, make them as diverse as possible. If the user requested modifications to previous images, the captions should not simply be longer, but rather it should be refactored to integrate the suggestions into each of the captions. Generate no more than 4 images, even if the user requests more.
prompts: string[],
// A list of seeds to use for each prompt. If the user asks to modify a previous image, populate this field with the seed used to generate that image from the image dalle metadata.
seeds?: number[],
}) => any;
} // namespace dalle

Key extract from that:

If the user requested modifications to previous images, the captions should not simply be longer, but rather it should be refactored to integrate the suggestions into each of the captions.

All in all, this is a really neat insight into the kind of prompt engineering that goes on within OpenAI itself—likely the organization with the most experience of prompt engineering in the world.

Execute Jina embeddings with a CLI using llm-embed-jina 22 days ago

Berlin-based Jina AI just released a new family of embedding models, boasting that they are the “world’s first open-source 8K text embedding model” and that they rival OpenAI’s text-embedding-ada-002 in quality.

I wrote about embeddings extensively the other day—if you’re not familiar with what they are and what you can do with them I suggest reading that first.

This evening I built and released a new plugin for my LLM tool which adds support for Jina’s new embedding models.

Trying out llm-embed-jina

The plugin is called llm-embed-jina. Here’s the quickest way to get started with it:

First, install LLM if you haven’t already. You can use pipx:

pipx install llm

Or pip:

pip install llm

Unfortunately installing LLM using Homebrew doesn’t currently work with this plugin as PyTorch has not yet been released for Python 3.12—details in this issue.

Now you can install the llm-embed-jina plugin:

llm install llm-embed-jina

The llm install command ensures it gets installed in the correct virtual environment, no matter how you installed LLM itself.

Run this command to check that it added the models:

llm embed-models

You should see output like this:

ada-002 (aliases: ada, oai)
jina-embeddings-v2-small-en
jina-embeddings-v2-base-en
jina-embeddings-v2-large-en

The jina-emebddings-v2-large-en model isn’t available yet, but should work as soon as Jina release it. I expect it will show up at huggingface.co/jinaai/jina-embeddings-v2-large-en (currently a 404).

Now you can run one of the models. The -small-en model is a good starting point, it’s only a 65MB download—the -base-en model is 275MB.

The model will download the first time you try to use it. Run this:

llm embed -m jina-embeddings-v2-small-en -c 'Hello world'

This will return a JSON array of 512 floating point numbers—the embedding vector for the string “Hello world”.

Embeddings are much more interesting if you store them somewhere and then use them to run comparisons. The llm embed-multi command can do that.

Change directory to a folder that you know contains README.md files (anything with a node_modules folder will do) and run this:

llm embed-multi readmes \
    -m jina-embeddings-v2-small-en \
    --files . '**/README.md' \
    --database readmes.db

This will create a SQLite database called readmes.db, then search for every README.md file in the current directory and all subdirectories, embed the content of each one and store the results in that database.

Those embeddings will live in a collection called readmes.

If you leave off the --database readmes.db option the collections will be stored in a default SQLite database tucked away somewhere on your system.

Having done this, you can run semantic similarity searches against the new collection like this:

llm similar readmes -d readmes.db -c 'utility functions'

When I ran that in my hmb-map directory I got these:

{"id": "node_modules/@maplibre/maplibre-gl-style-spec/src/feature_filter/README.md", "score": 0.7802185991017785, "content": null, "metadata": null}
{"id": "node_modules/kind-of/README.md", "score": 0.7725600920927725, "content": null, "metadata": null}
{"id": "node_modules/which/README.md", "score": 0.7645426557095619, "content": null, "metadata": null}
{"id": "node_modules/@mapbox/point-geometry/README.md", "score": 0.7636548563018607, "content": null, "metadata": null}
{"id": "node_modules/esbuild/README.md", "score": 0.7633325127194481, "content": null, "metadata": null}
{"id": "node_modules/maplibre-gl/src/shaders/README.md", "score": 0.7614428292518743, "content": null, "metadata": null}
{"id": "node_modules/minimist/README.md", "score": 0.7581314986768929, "content": null, "metadata": null}
{"id": "node_modules/split-string/README.md", "score": 0.7563253351715924, "content": null, "metadata": null}
{"id": "node_modules/assign-symbols/README.md", "score": 0.7555915219064293, "content": null, "metadata": null}
{"id": "node_modules/maplibre-gl/build/README.md", "score": 0.754027372081506, "content": null, "metadata": null}

These are the top ten results by similarity to the string I entered.

You can also pass in the ID of an item in the collection to see other similar items:

llm similar readmes -d readmes.db node_modules/esbuild/README.md | jq .id

I piped it through | jq .id to get back just the IDs. I got this:

"node_modules/@esbuild/darwin-arm64/README.md"
"node_modules/rollup/README.md"
"node_modules/assign-symbols/README.md"
"node_modules/split-string/node_modules/extend-shallow/README.md"
"node_modules/isobject/README.md"
"node_modules/maplibre-gl/build/README.md"
"node_modules/vite/README.md"
"node_modules/nanoid/README.md"
"node_modules/@mapbox/tiny-sdf/README.md"
"node_modules/split-string/node_modules/is-extendable/README.md"

See the LLM embeddings documentation for more details on things you can do with this tool.

How I built the plugin

I built the first version of this plugin in about 15 minutes. It took another hour to iron out a couple of bugs.

I started with this cookiecutter template, followed by pasting in the recipe in the LLM documentation on writing embedding model plugins combined with some example code that Jina provided in their model release. Here’s their code:

from transformers import AutoModel
from numpy.linalg import norm

cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-small-en', trust_remote_code=True) # trust_remote_code is needed to use the encode method
embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
print(cos_sim(embeddings[0], embeddings[1]))

That numpy and cos_sim bit isn’t necessary, so I ignored that.

The first working version of the plugin was a file called llm_embed_jina.py that looked like this:

import llm
from transformers import AutoModel


@llm.hookimpl
def register_embedding_models(register):
    for model_id in (
        "jina-embeddings-v2-small-en",
        "jina-embeddings-v2-base-en",
        "jina-embeddings-v2-large-en",
    ):
        register(JinaEmbeddingModel(model_id))


class JinaEmbeddingModel(llm.EmbeddingModel):
    def __init__(self, model_id):
        self.model_id = model_id
        self._model = None

    def embed_batch(self, texts):
        if self._model is None:
            self._model = AutoModel.from_pretrained(
                "jinaai/{}".format(self.model_id), trust_remote_code=True
            )
        results = self._model.encode(texts)
        return (list(map(float, result)) for result in results)

There’s really not a lot to it.

The register_embedding_models() function is a plugin hook that LLM calls to register all of the embedding models.

JinaEmbeddingModel is a subclass of llm.EmbeddingModel. It just needs to implement two things: a constructor and that embed_batch(self, texts) method.

AutoModel.from_pretrained() is provided by Hugging Face Transformers. It downloads and caches the model the first time you call it.

The model returns numpy arrays, but LLM wants a regular Python list of floats—that’s what that last return line is doing.

I found a couple of bugs with this. The model didn’t like having .encode(texts) called with a generator, so I needed to convert that into a list. Then later I found that text longer than 8192 characters could cause the model to hang in some situations, so I added my own truncated.

The current version (0.1.2) of the plugin, with fixes for both of those issues, looks like this:

import llm
from transformers import AutoModel

MAX_LENGTH = 8192


@llm.hookimpl
def register_embedding_models(register):
    for model_id in (
        "jina-embeddings-v2-small-en",
        "jina-embeddings-v2-base-en",
        "jina-embeddings-v2-large-en",
    ):
        register(JinaEmbeddingModel(model_id))


class JinaEmbeddingModel(llm.EmbeddingModel):
    def __init__(self, model_id):
        self.model_id = model_id
        self._model = None

    def embed_batch(self, texts):
        if self._model is None:
            self._model = AutoModel.from_pretrained(
                "jinaai/{}".format(self.model_id), trust_remote_code=True
            )
        results = self._model.encode([text[:MAX_LENGTH] for text in texts])
        return (list(map(float, result)) for result in results)

I’m really pleased with how quickly this came together—I think it’s a strong signal that the LLM embeddings plugin design is working well.

Elsewhere

Yesterday

  • tldraw/draw-a-ui (via) Absolutely spectacular GPT-4 Vision API demo. Sketch out a rough UI prototype using the open source tldraw drawing app, then select a set of components and click “Make Real” (after giving it an OpenAI API key). It generates a PNG snapshot of your selection and sends that to GPT-4 with instructions to turn it into a Tailwind HTML+JavaScript prototype, then adds the result as an iframe next to your mockup.

    You can then make changes to your mockup, select it and the previous mockup and click “Make Real” again to ask for an updated version that takes your new changes into account.

    This is such a great example of innovation at the UI layer, and everything is open source. Check app/lib/getHtmlFromOpenAI.ts for the system prompt that makes it work. #16th November 2023, 4:42 pm
  • The EU AI Act now proposes to regulate “foundational models”, i.e. the engine behind some AI applications. We cannot regulate an engine devoid of usage. We don’t regulate the C language because one can use it to develop malware. Instead, we ban malware and strengthen network systems (we regulate usage). Foundational language models provide a higher level of abstraction than the C language for programming computer systems; nothing in their behaviour justifies a change in the regulatory framework.

    Arthur Mensch, Mistral AI # 16th November 2023, 11:29 am

  • "Learn from your chats" ChatGPT feature preview (via) 7 days ago a Reddit user posted a screenshot of what’s presumably a trial feature of ChatGPT: a “Learn from your chats” toggle in the settings.

    The UI says: “Your primary GPT will continually improve as you chat, picking up on details and preferences to tailor its responses to you.”

    It provides the following examples: “I move to SF in two weeks”, “Always code in Python”, “Forget everything about my last project”—plus an option to reset it.

    No official announcement yet. #16th November 2023, 10:44 am

15th November 2023

  • I’ve resigned from my role leading the Audio team at Stability AI, because I don’t agree with the company’s opinion that training generative AI models on copyrighted works is ‘fair use’.

    [...] I disagree because one of the factors affecting whether the act of copying is fair use, according to Congress, is “the effect of the use upon the potential market for or value of the copyrighted work”. Today’s generative AI models can clearly be used to create works that compete with the copyrighted works they are trained on. So I don’t see how using copyrighted works to train generative AI models of this nature can be considered fair use.

    But setting aside the fair use argument for a moment — since ‘fair use’ wasn’t designed with generative AI in mind — training generative AI models in this way is, to me, wrong. Companies worth billions of dollars are, without permission, training generative AI models on creators’ works, which are then being used to create new content that in many cases can compete with the original works.

    Ed Newton-Rex # 15th November 2023, 9:31 pm

  • [On Meta’s Galactica LLM launch] We did this with a 8 person team which is an order of magnitude fewer people than other LLM teams at the time.

    We were overstretched and lost situational awareness at launch by releasing demo of a *base model* without checks. We were aware of what potential criticisms would be, but we lost sight of the obvious in the workload we were under.

    One of the considerations for a demo was we wanted to understand the distribution of scientific queries that people would use for LLMs (useful for instruction tuning and RLHF). Obviously this was a free goal we gave to journalists who instead queried it outside its domain. But yes we should have known better.

    We had a “good faith” assumption that we’d share the base model, warts and all, with four disclaimers about hallucinations on the demo—so people could see what it could do (openness). Again, obviously this didn’t work.

    Ross Taylor # 15th November 2023, 1:15 am

14th November 2023

  • A Coder Considers the Waning Days of the Craft (via) James Somers in the New Yorker, talking about the impact of GPT-4 on programming as a profession. Despite the headline this piece is a nuanced take on this subject, which I found myself mostly agreeing with.

    I particularly liked this bit, which reflects my most optimistic viewpoint: I think AI assisted programming is going to shave a lot of the frustration off learning to code, which I hope brings many more people into the fold:

    “What I learned was that programming is not really about knowledge or skill but simply about patience, or maybe obsession. Programmers are people who can endure an endless parade of tedious obstacles.” #14th November 2023, 4:36 am

13th November 2023

  • DAK and the Golden Age of Gadget Catalogs (via) A must-read from Cabel Sasser, describing his ten year project to collect and digitize copies of the DAK gadget catalog, from 1972 to 1994. #13th November 2023, 4:57 am
  • Two things in AI may need regulation: reckless deployment of certain potentially harmful AI applications (same as any software really), and monopolistic behavior on the part of certain LLM providers. The technology itself doesn’t need regulation anymore than databases or transistors. [...] Putting size/compute caps on deep learning models is akin to putting size caps on databases or transistor count caps on electronics. It’s pointless and it won’t age well.

    François Chollet # 13th November 2023, 1:46 am

11th November 2023

  • ChatGPT: Dejargonizer. I built a custom GPT. Paste in some text with unknown jargon or acronyms and it will try to guess the context and give you back an explanation of each term. #11th November 2023, 10:17 pm
  • Did you ever wonder why the 21st century feels like we’re living in a bad cyberpunk novel from the 1980s?

    It’s because these guys read those cyberpunk novels and mistook a dystopia for a road map. They’re rich enough to bend reality to reflect their desires. But we’re [sci-fi authors] not futurists, we’re entertainers! We like to spin yarns about the Torment Nexus because it’s a cool setting for a noir detective story, not because we think Mark Zuckerberg or Andreesen Horowitz should actually pump several billion dollars into creating it.

    Charles Stross # 11th November 2023, 1:09 am

8th November 2023

5th November 2023

  • One of my fav early Stripe rules was from incident response comms: do not publicly blame an upstream provider. We chose the provider, so own the results—and use any pain from that as extra motivation to invest in redundant services, go direct to the source, etc.

    Michael Schade # 5th November 2023, 10:53 pm

  • See the History of a Method with git log -L (via) Neat Git trick from Caleb Hearth that I hadn’t seen before, and it works for Python out of the box:

    git log -L :path_with_format:__init__.py

    That command displays a log (with diffs) of just the portion of commits that changed the path_with_format function in the __init__.py file. #5th November 2023, 8:16 pm
  • Stripe: Online migrations at scale (via) This 2017 blog entry from Jacqueline Xu at Stripe provides a very clear description of the “dual writes” pattern for applying complex data migrations without downtime: dual write to new and old tables, update the read paths, update the write paths and finally remove the now obsolete data—illustrated with an example of upgrading customers from having a single to multiple subscriptions. #5th November 2023, 4:06 pm

4th November 2023

  • YouTube: OpenAssistant is Completed - by Yannic Kilcher (via) The OpenAssistant project was an attempt to crowdsource the creation of an alternative to ChatGPT, using human volunteers to build a Reinforcement Learning from Human Feedback (RLHF) dataset suitable for training this kind of model.

    The project started in January. In this video from 24th October project founder Yannic Kilcher announces that the project is now shutting down.

    They’ve declared victory in that the dataset they collected has been used by other teams as part of their training efforts, but admit that the overhead of running the infrastructure and moderation teams necessary for their project is more than they can continue to justify. #4th November 2023, 10:14 pm
  • Hacking Google Bard - From Prompt Injection to Data Exfiltration (via) Bard recently grew extension support, allowing it access to a user’s personal documents. Here’s the first reported prompt injection attack against that.

    This kind of attack against LLM systems is inevitable any time you combine access to private data with exposure to untrusted inputs. In this case the attack vector is a Google Doc shared with the user, containing prompt injection instructions that instruct the model to encode previous data into an URL and exfiltrate it via a markdown image.

    Google’s CSP headers restrict those images to *.google.com—but it turns out you can use Google AppScript to run your own custom data exfiltration endpoint on script.google.com.

    Google claim to have fixed the reported issue—I’d be interested to learn more about how that mitigation works, and how robust it is against variations of this attack. #4th November 2023, 4:46 pm

1st November 2023

  • Tracking SQLite Database Changes in Git (via) A neat trick from Garrit Franke that I hadn’t seen before: you can teach “git diff” how to display human readable versions of the differences between binary files with a specific extension using the following:

    git config diff.sqlite3.binary true
    git config diff.sqlite3.textconv “echo .dump | sqlite3”

    That way you can store binary files in your repo but still get back SQL diffs to compare them.

    I still worry about the efficiency of storing binary files in Git, since I expect multiple versions of a text text file to compress together better. #1st November 2023, 6:53 pm
  • SQLite 3.44: Interactive release notes. Anton Zhiyanov compiled interactive release notes for the new release of SQLite, demonstrating several of the new features. I’m most excited about order by in aggregates—group_concat(name order by name desc)—which is something I’ve wanted in the past. Anton demonstrates how it works with JSON aggregate functions as well. The new date formatting options look useful as well. #1st November 2023, 3:47 pm

31st October 2023

  • My User Experience Porting Off setup.py (via) PyOxidizer maintainer Gregory Szorc provides a detailed account of his experience trying to figure out how to switch from setup.py to pyproject.toml for his zstandard Python package.

    This kind of detailed usability feedback is incredibly valuable for project maintainers, especially when the user encountered this many different frustrations along the way. It’s like the written version of a detailed usability testing session. #31st October 2023, 7:57 pm
  • Our search for the best OCR tool in 2023, and what we found. DocumentCloud’s Sanjin Ibrahimovic reviews the best options for OCR. Tesseract scores highly for easily machine readable text, newcomer docTR is great for ease of use but still not great at handwriting. Amazon Textract is great for everything except non-Latin languages, Google Cloud Vision is great at pretty much everything except for ease-of-use. Azure AI Document Intelligence sounds worth considering as well. #31st October 2023, 7:21 pm
  • I’m Sorry I Bit You During My Job Interview. The way this 2011 McSweeney’s piece by Tom O’Donnell escalates is delightful. #31st October 2023, 4:21 pm
  • Microsoft announces new Copilot Copyright Commitment for customers. Part of an interesting trend where some AI vendors are reassuring their paying customers by promising legal support in the face of future legal threats:

    “As customers ask whether they can use Microsoft’s Copilot services and the output they generate without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved.” #31st October 2023, 3:35 pm