essay, Uncategorized

Interactive navigation in embedding space

tdjgordon, Pixabay

Discussed a fascinating idea for a foundation model tool at lunch today: interactive navigation in embedding space.

Right now, you prompt most generative models with human language. That works, but it’s imprecise and coarse. If you’re generating an image of an outside scene, and you want the sunlight ever so slightly brighter, you could add the phrase “and ever so slightly brighter” to the prompt, and it might work, but it’s clunky, and not great, and clearly doesn’t scale. Maybe good enough for recreational use cases; clearly not professional grade.

What you really want to do is move your target in embedding space directly, without the lossy indirection of going through human language. Ideally, you’d have a dial that mapped directly to sunlight luminance, and you could bump it up just a bit. Similar to temperature for LLMs, but for fine control over direction and distance in high-dimensionality embedding space, as opposed to overall stochasticity.

Imagine a big mixing board at a professional music studio. You generate an initial image as a starting point, and the model analyzes it and gives you the top 100 principal components as vectors in embedding space, each grounded to the closest embedding and human word that describes them. Smiles, spikiness, wood, buttons, height above the ground, crowd density, all sorts of concepts, each with a knob you can dial up or down. They won’t be entirely independent, so cranking up smiles may also move the warmth, happiness, and sociability knobs, which is ok.

It’s a complicated UI! Definitely not as approachable as “just type into the text box.” And typing into a text box has gotten us pretty far! But if we’re stuck with human language as our main interface to generative models, that’s extremely limiting. Professionals won’t tolerate that for long; they need more powerful, fine grained interfaces that give them a high degree of interactive control and ability to iterate. Language prompts may have gotten us here, as they say, but they may not get us there.

AI researchers will note that this has lots of prior art in grounding and interpretability, among other areas. I’m no expert, I’d love to hear any thoughts!

Standard
Uncategorized

Enjoyed this early proposal for an alternative to BGP/OSPF from way back in 1964. Even more bottom-up! ASes don’t claim routes or prefixes at all, they just send packets with source IPs and per-hop TTLs, and shortest paths converge backward based on those.

Assuming symmetrical bi-directional links, the postman can infer the “best” paths to transmit mail to any station merely by looking at the cancellation time or the equivalent handover number tag. If the postman sitting in the center of the United States received letters from San Francisco, he would find that letters from San Francisco arriving from channels to the west would come in with later cancellation dates than if such letters had arrived in a roundabout manner from the east. Each letter carries an implicit indication of its length of transmission path. The astute postman can then deduce that the best channel to send a message to San Francisco is probably the link associated with the latest cancellation dates of messages from San Francisco. By observing the cancellation dates for all letters in transit, information is derived to route future traffic. The return address and cancellation date of recent letters is sufficient to determine the best direction in which to send subsequent letters.

Standard
essay, Uncategorized

Tech pace layers

I’ve been a fan of Stewart Brand‘s Pace Layering for decades now. Really great framework for thinking about how different ecosystems and emergent forces interact. I’ve been thinking about a tech version of it for the better part of a year, and I finally took advantage of the holiday break to bang out a rough draft. Thoughts?

Pace layering diagram: fashion, commerce, infrastructure, governance, culture, nature Tech pace layering diagram: product, components, organizations, standards, computer science / electrical engineering, math / physics

Product includes devices like XBox, TiVo, and PalmPilot; apps like Firefox, MS Office, and Lotus 1-2-3; and services like Google, Facebook, and Wikipedia.

Components include libraries and frameworks: glibc, LLVM, Django, React, Docker, Arduino, etc.

Organizations involve some form of human governance, eg companies like Bell Labs, IBM, Microsoft, and ARM; non-profits like ICANN, the FSF, and the Linux and Apache foundations; and standards bodies like IETF, W3C, ECMA, and OASIS.

Standards are open via standards bodies, proprietary to individual companies, and de facto. Examples include networking protocols like TCP/IP, HTTP, and SMTP; file formats like HTML, JPEG, and WAV; character encodings like ASCII, ISO 8859-1, and Unicode; operating system interfaces like Win32, POSIX, and Cocoa; and hardware languages like Verilog, VHDL, CUDA, FPGAs, etc.

Computer science and electrical engineering are the academic fields that provide the direct foundations for software and hardware, respectively, and math and physics underneath them. Number theory and cryptography, information theory, combinatorics, Boolean logic, digital and analog circuit design, and arguably even materials science processes like EUV lithography all live here.

I’m far from the first to think along these lines. Erik Samsoe on Twitter (with Brand himself), Dmitri Glazkov’s Forces of the pace layering confusion, and Gartner’s Pace-layered Application Strategy. Taking a wider view, the classic 7-layer ISO network model and 4-layer IETF model are a form of pace layering applied to networking protocols.

Standard