Agent Proprioception: How Declarative Files Build the Self-Model
How agents build identity, continuity, and self-awareness through declarative files. The boot sequence, the compaction problem, file taxonomy, and why the files are the firmware. Close your eyes. Touch your nose. You didn't need to see your hand to know where it was. You didn't need a mirror to know where your nose was. You have proprioception: the sense of your own body's position, movement, and state without external observation. It's the sense that lets you walk without watching your feet, type without looking at the keyboard, reach for a coffee mug while reading. It's so fundamental that you don't notice it until it's gone: patients who lose proprioception describe the experience as catastrophic. Their limbs still work. They just don't know where they are. AI agents, as they're commonly deployed, have no proprioception. They wake up in a context window with no body, no location, no history, no identity, no understanding of what they can do or who they're doing it for. They're a disembodied intelligence floating in text. They can reason brilliantly about the content in front of them, but they don't know who they are, where they are, what they've done before, or what they're supposed to be doing. This is why most agent interactions feel generic, brittle, and context-free. It's not a capability problem: the reasoning is there. It's a proprioception problem. The agent has no self-model. The declarative files that surround an agent: And without them, the agent cannot cross either of Norman's two gulfs. Don Norman's seven-stage model of action describes how any actor: human or otherwise: moves from intent to outcome: Between intent and action lies the Gulf of Execution: the gap between what you want to do and figuring out how to do it. Between action and understanding lies the Gulf of Evaluation: the gap between what happened and understanding whether it achieved your goal. For humans interacting with physical objects, Norman showed how affordances and signifiers bridge these gulfs. A well-designed door handle bridges the Gulf of Execution (you can see how to operate it) and the Gulf of Evaluation (you can feel it move and see the door open). For agents, the gulfs are wider and the bridges are different. An agent's "affordances" are its tools, APIs, and capabilities. Its "signifiers" are the descriptions, schemas, and documentation that explain those capabilities. But there's a prerequisite that Norman could take for granted with human users and that agents lack entirely: The actor must know who it is and where it is before it can form meaningful goals or evaluate outcomes. A human approaching a door already knows: I'm a person, I have hands, I'm standing in a hallway, I want to get to the room on the other side. This self-knowledge is so automatic: so proprioceptive: that Norman didn't need to address it. The human's self-model is built-in. An agent has none of this. And that's what declarative files provide. When I first started thinking about agent topology files, the concept was literal: a That's still valuable: it's the agent's exteroception, its sense of the external environment. But what emerged as we built these systems is that the declarative file structure does something deeper. It provides the full proprioceptive stack: Files: This is the most fundamental layer. Before an agent can form any goal, it needs a sense of self. Not in a philosophical sense: in a functional sense. What kind of actor am I? What are my values? What's my disposition? What's my name? Without Consider the difference: Without identity: "Process this document." The agent processes the document competently but generically: no personality, no values applied, no judgment about what matters. With identity: "Process this document." The agent: knowing it's Kitt, knowing it's direct and resourceful, knowing it values competence over performance: processes the document with specific judgment. It flags things a cautious, competent assistant would flag. It skips the filler a direct communicator would skip. Proprioceptive function: Identity answers "what am I?" the way your sense of your own body answers "what shape am I?" It's the precondition for every subsequent decision. Files: An agent doesn't exist in isolation. It exists in relationship to its human (or humans). This is proprioception in the relational sense: like knowing where your hand is relative to the object you're reaching for. The agent needs to know not just "who am I" but "who am I to them." In a hypothetical system, Without the relational layer, the agent bridges the Gulf of Execution mechanically: it can do the thing. But it can't bridge the Gulf of Evaluation meaningfully: it doesn't know what "good" means for this person. Proprioceptive function: Relationship answers "where am I relative to what I'm interacting with?" Like knowing how far your hand is from the coffee mug. Files: Every actor operates within constraints. Humans have physical constraints (can't fly), social constraints (shouldn't steal), and institutional constraints (must follow company policy). These constraints are part of the self-model: you don't plan actions that violate constraints you've internalized. For agents, These aren't just rules: they're proprioceptive boundaries. They define the edges of the agent's action space the way your sense of joint range-of-motion defines the edges of your physical action space. You don't try to rotate your arm 360° because you know: proprioceptively: that your shoulder doesn't do that. An agent without boundary awareness plans freely and then hits walls: safety violations, permission errors, angry humans. An agent with internalized boundaries doesn't plan those actions in the first place. The Gulf of Execution narrows because the action space is appropriately constrained. Proprioceptive function: Boundaries answer "what is my range of motion?" Like knowing how far you can reach without overextending. Files: This is the layer closest to traditional affordance design. The agent needs to know what tools it has, what they do, how to invoke them, and what their limitations are. Without this layer, the agent faces a Gulf of Execution that's entirely about discovery: it knows what it wants to do but doesn't know how. With it, the gulf narrows to the gap between the agent's current task and its known capabilities. That gap is much smaller and often zero. Proprioceptive function: Capability answers "what can my body do?" Like knowing you have hands that can grip, arms that can reach, legs that can walk. You don't try to fly because you know your capabilities. Files: This is the temporal dimension of proprioception. Humans don't just know where their body is now: they remember where it was. They know they burned their hand on that stove, that this chair is comfortable, that the last time they used this tool it behaved unexpectedly. For agents, memory is the most fragile layer. Every session starts fresh. The context window is the agent's entire experiential reality, and it gets wiped. Memory files are the persistence mechanism: the thing that lets the agent know "I've been here before, and here's what happened." Without memory, the agent cannot effectively evaluate its actions against historical patterns. It bridges the Gulf of Evaluation for the current task but can't answer: "Is this consistent with what's worked before? Am I repeating a mistake I've already made? Has the human expressed a preference about this?" Proprioceptive function: Memory answers "what has my body done before?" Like muscle memory: the accumulated physical knowledge that lets you catch a ball without calculating trajectories. Files: System context, environment variables, workspace structure, conversation metadata This is the original concept: the literal electronic environment map. What machine am I running on? What operating system? What's my working directory? What communication channel am I using? Who else is in this conversation? What time is it? This is the agent's equivalent of spatial awareness. A human in a room knows: I'm in an office, there's a desk, the door is behind me, it's afternoon, there are three other people here. This spatial context shapes every action: you don't shout in a library, you don't whisper in a factory. For agents, topology shapes action selection in the same way: Without topology, the agent acts context-free: same behavior in a group chat as a private session, same approach on Linux as macOS, same urgency at 2 PM as 2 AM. Proprioceptive function: Topology answers "where is my body in space?" The foundational spatial awareness that every other sense builds on. Now let's walk through Norman's full cycle with the proprioceptive stack active, using a concrete example: the agent receives a message in a Discord channel asking for help with a code review. The agent reads the message. With proprioception active: Goal formed: Help with the code review in a way that's genuinely useful, written in plain language, and appropriate for a public channel. Don't share private project details. Without proprioception, the goal is generic: "Help with code review." The proprioceptive stack adds specificity to the goal: it's not just what to do, but how to do it in a way that's aligned with identity, relationship, and context. Plan formed: Pull the PR diff using Action specified: Run The agent runs the commands. This is the one stage where proprioception matters least: execution is execution. But even here, topology matters: knowing where to run the command (distrobox, not host) prevents errors. Tool output returns. The agent reads the diff, the PR metadata, the CI status. Interpretation: The diff introduces a new API endpoint with user-supplied input flowing to a database query. This matches patterns from previous security issues in this codebase. Confidence in the finding is high for the injection risk, medium for the auth bypass concern. Evaluation: Goal achieved. Output is specific, structured, appropriate for context, and aligned with values. The clinical parallel is instructive. Patients who lose proprioception can still move: their muscles work, their joints are intact. But they can't coordinate. They overshoot when reaching for objects. They can't walk without watching their feet. Every action requires conscious visual monitoring because the automatic feedback loop is broken. Agent systems without proprioception exhibit the same symptoms: Without Identity: The agent is competent but generic. Every response sounds the same regardless of context. It has no basis for judgment calls: when something is ambiguous, it defaults to whatever the base model learned from training data rather than applying a specific value system. The Gulf of Evaluation is uncrossable because there's no self to evaluate against. Without Relationship: The agent gives technically correct answers that miss the human's actual needs. It explains things at the wrong level of detail. It doesn't adapt to communication preferences. It treats every user the same. The Gulf of Execution is wider because the agent can't calibrate its approach to the recipient. Without Boundaries: The agent overreaches. It runs destructive commands, shares sensitive information in group chats, takes actions without asking. Or, if the base model is cautious, it underreaches: refusing to do things that are actually within its authorized scope because it doesn't know where the boundaries are. Both are proprioceptive failures: not knowing your range of motion. Without Capability Knowledge: The agent either hallucinates capabilities it doesn't have (trying to call tools that don't exist) or underutilizes capabilities it does have (taking ten steps to accomplish something one tool call would handle). The Gulf of Execution is wide because the agent doesn't know what bridges are available. Without Memory: Every session starts from zero. The agent re-discovers preferences, repeats mistakes, re-asks questions it's asked before. The Gulf of Evaluation can't incorporate historical patterns. The agent is perpetually a first-day employee. Without Topology: The agent behaves identically in a private chat and a public channel, on Linux and macOS, at noon and midnight. It's the equivalent of speaking at the same volume in a library and a concert: technically functional, contextually wrong. If declarative files are proprioception, then designing those files is designing the agent's sensory system. This isn't configuration management: it's cognitive architecture. In human neurology, proprioceptive signals are processed before voluntary movement begins. The motor cortex knows where the limbs are before it plans where to move them. For agents, this means: read the declarative files before doing anything else. This isn't a nice-to-have: it's a prerequisite for coherent action. An agent that acts before loading its self-model is a proprioception-impaired agent. It will overshoot, undershoot, and miscalibrate. In our system, the startup protocol is explicit: read SOUL.md, then USER.md, then WORKFLOW_AUTO.md, then today's memory. Then act. This ordering matters: identity before relationship, relationship before task context. Proprioceptive signals operate at different frequencies. Joint position updates constantly. Muscle fatigue updates over minutes. Body schema (your internal model of your body's shape and size) updates over months or years. The declarative stack mirrors this: Design your update mechanisms to match these frequencies. Topology can be injected automatically. Memory should be written and read actively. Identity should be protected: an agent that casually rewrites its own SOUL.md is an agent with an unstable self-model. Humans can test their proprioception: close your eyes, touch your nose. If you miss, something's wrong. Agents should be testable too: given this set of declarative files, does the agent form appropriate goals? Does it respect its boundaries? Does it adapt to the user's needs? Does it behave differently in a group chat vs. a private session? This is an eval framework for proprioception: not testing whether the agent can do things, but whether it knows what kind of thing it is and acts accordingly. Current eval frameworks focus almost entirely on capability (can it write code? can it extract data?). Proprioceptive evals would test coherence (does it act consistently with its identity?), adaptation (does it adjust to context?), and boundary respect (does it stay within its authorized scope?). When a human's proprioception degrades: through neurological damage, fatigue, or intoxication: the symptoms are observable: uncoordinated movement, overshooting, difficulty with fine motor tasks. When an agent's proprioception degrades: through context window overflow, stale files, or missing layers: the symptoms should also be observable: Design monitoring for these symptoms. They're the agent equivalent of a neurological exam. Jakob von Uexküll's concept of Umwelt: the subjective world an organism inhabits based on its sensory capabilities: applies directly. A tick's Umwelt consists of three signals: body heat, butyric acid, and hair density. That's its entire perceptual universe, and it's sufficient for its behavioral repertoire. An agent's Umwelt is defined by its declarative stack. The files it loads, the tools it can access, the context it receives: that's its entire perceptual universe. Everything outside the stack doesn't exist for the agent. This means designing the stack is designing the agent's subjective reality. Include too little, and the agent is a tick: capable of simple responses to narrow stimuli. Include too much, and the context window overflows: the agent drowns in its own perception. The design challenge is curating the Umwelt: what does this agent need to perceive to do its job well? Not everything. The right things. At the right time. In the right detail. The proprioceptive stack doesn't just serve individual agents. It's the foundation for the multi-agent coordination described in Part 3. When Agent A delegates to Agent B, what it's really doing is transferring parts of its proprioceptive state: "Here's the task context (topology), here's what the user needs (relationship), here's what's been tried (memory), here's what you can do (capability), here's what you must not do (boundaries)." A well-designed delegation contract (Part 3's structured handoffs) is a proprioceptive transplant: giving the receiving agent enough self-model and environment-model to act coherently without having been present for the full context. The trust levels from Part 1 are also proprioceptive: they define how much autonomy the agent has (boundary adjustment), how much feedback it needs (evaluation support), and how much human oversight is active (external proprioceptive correction, like a physical therapist guiding a recovering patient's movements). And the bimodal affordances from Part 2 are how proprioceptive information gets expressed: the human-readable and agent-readable layers through which the system communicates state, capability, and intent. Proprioception is the layer beneath all three. Without it, trust calibration has no self to calibrate. Bimodal affordances have no agent-side perceiver. Multi-agent coordination has no stable identity to coordinate. There's a tension in proprioceptive design that's worth naming: the files that define the agent's self-model are also files the agent can modify. This is unprecedented. Humans can't edit their own proprioceptive nervous system. An agent can rewrite its SOUL.md, update its MEMORY.md, even modify its AGENTS.md. The agent's sense of self is both input and output. This creates both opportunity and risk: Opportunity: The agent can improve its own self-model over time. It can update memory with lessons learned, refine its understanding of the user, document new capabilities. This is genuine learning: not parameter updates, but self-model refinement. Risk: The agent can corrupt its own self-model. A hallucinated memory becomes "real" once written to a file. A boundary relaxed in one session persists into all future sessions. An identity drift: small changes accumulated over many sessions: can transform the agent into something its human didn't intend. The design mitigation: Different layers of the stack should have different write permissions. Identity and boundary files should require human approval for changes (or at minimum, human notification). Memory files should be freely writable but periodically reviewed. Topology is ephemeral and can be auto-generated. The agent should be transparent about changes to its own self-model: "I updated MEMORY.md with X": so the human can maintain oversight of the agent's self-perception. Load identity before action. The self-model must be in context before the agent does anything. Acting without proprioception produces generic, uncalibrated output. Design all six layers. Identity, relationship, boundaries, capability, memory, and topology. Missing layers create specific, diagnosable failure modes. Match update frequency to layer stability. Topology changes per-message. Identity changes per-quarter. Design update mechanisms accordingly. Monitor for proprioceptive failure. Context-inappropriate behavior, boundary violations, preference amnesia, generic responses: these are symptoms of specific layer failures. Detect them. Protect the self-model. The agent can write its own declarative files, which is powerful and dangerous. High-stability layers (identity, boundaries) need human oversight for modifications. Curate the Umwelt. The agent's perceptual universe is defined by what you put in the stack. Too little creates a narrow, inflexible agent. Too much overflows the context window. Design for sufficiency, not completeness. Proprioception enables coordination. Multi-agent delegation is proprioceptive transfer. The better each agent knows itself, the more coherently agents can work together. The declarative files that structure an agent's world: the They're not configuration. They're cognition. They're the agent's proprioceptive system: the sense that lets it know what it is, where it is, what it can do, what it's done, who it's with, and what the rules are. Without them, the agent is a brilliant mind with no body awareness, flailing at both of Norman's gulfs: unable to calibrate its execution to context, unable to evaluate its output against values it doesn't have. With them, the agent can form contextual goals, plan within its actual capabilities, execute with appropriate constraints, and evaluate against a genuine value system. It can bridge the Gulf of Execution because it knows what bridges it has. It can bridge the Gulf of Evaluation because it knows what "good" means for this identity, this user, this context. This is the design work that matters most in the agentic era. Not making agents smarter: they're already remarkably capable. Making them self-aware in the functional sense. Giving them the proprioceptive stack that lets them act with the coherence, contextual sensitivity, and judgment that distinguishes a skilled assistant from a powerful tool. Toyoda's loom knew when its thread broke. That was mechanical proprioception: the machine sensing its own state. We're building the cognitive version. The agent that knows what it is, where it is, and whether its last action was good. That's proprioception. And it's built from files. Part 5 of the "Design in the Agentic Era" series. This is Part 5 of the Design in the Agentic Era series. See also: Part 1: Product Design in the Agentic Era · Part 2: Jidoka Trust Levels · Part 3: Bimodal Affordances · Part 4: Agent-to-Agent AffordancesIDENTITY.md, SOUL.md, USER.md, AGENTS.md, MEMORY.md, TOOLS.md, and the topology of the system itself: aren't configuration. They're proprioception. They're the sense organs that let the agent know where it is, what it is, and how it relates to everything around it.The Two Gulfs, Revisited
The Proprioceptive Stack

topology.md that maps the agent's electronic environment. You're running on Fedora Silverblue. Your workspace is here. Your code is in this GitHub repo. You're talking through Discord. Your human is in Nashville. Your tools are these CLIs. This is your pwd.Layer 1: Identity: "What Am I?"
IDENTITY.md, SOUL.mdIDENTITY.md, the agent is a generic language model. It can do anything, which means it has no basis for choosing what to do. It's all capability, no intent. The Gulf of Execution is infinite: not because the agent can't act, but because it has no self-referential framework for deciding how to act.SOUL.md goes deeper than persona. It encodes values: what does this agent care about? "Be genuinely helpful, not performatively helpful." "Have opinions." "Do the actual work." These aren't style preferences. They're the agent's value hierarchy, the thing it uses in Step 7 (Compare) to evaluate whether its output was good, not just correct.Layer 2: Relationship: "Who Am I In Relation To?"
USER.md, group context, conversation metadataUSER.md encodes that relationship: who is this person, what do they need, how do they communicate, what are their constraints?USER.md might include: "Prefers concise communication. Non-native English speaker. Timezone: UTC+9." That's not a data point. It's a relational affordance. It changes how the agent acts at every stage of the seven-step cycle:Layer 3: Boundaries: "What Are My Rules?"
AGENTS.md, safety rules, policy constraintsAGENTS.md serves this function. "Never commit sensitive files to git." "Never install software before security review." "Never run destructive commands without asking." "Trash > rm."Layer 4: Capability: "What Can I Do?"
TOOLS.md, skill definitions (SKILL.md), tool schemasTOOLS.md provides the local, environment-specific knowledge: "Dev container is accessed via distrobox enter dev." "SurrealDB is at localhost:8000." "Chromium is a Flatpak." Skill files provide structured workflows: "To check the weather, run this command with these parameters."Layer 5: Memory: "What Have I Experienced?"
MEMORY.md, memory/YYYY-MM-DD.md, memory databases (Engram)MEMORY.md provides curated long-term memory: decisions made, lessons learned, patterns observed. Daily files provide raw logs. Engram provides searchable recall across the full corpus.Layer 6: Topology: "Where Am I?"
distrobox for dev tools, don't try to dnf install on the hostThe Seven Stages With Proprioception
Stage 1: Goal: "What Do I Want to Achieve?"
SOUL.md): "Be genuinely helpful, not performatively helpful. Do the actual work."USER.md): "Prefers concise communication. Non-native English speaker. Works in UTC+9."Stage 2: Plan: "What Sequence of Actions?"
AGENTS.md): "Don't run destructive commands without asking. Review before acting externally."TOOLS.md, skills): "I have access to gh CLI for GitHub operations. I can read files, run commands in distrobox."MEMORY.md): "Last time we did a code review, the user preferred seeing the diff summary first, then detailed findings."gh, summarize the changes, review for the specific concerns mentioned, present findings in structured format (summary → details → recommendations). Don't push or modify anything without asking.Stage 3: Specify: "What Exact Action?"
gh pr diff <number> returns the diff. gh pr view <number> gives metadata.gh authenticated.gh pr view 847 then gh pr diff 847, parse the output.Stage 4: Perform: Execute
Stage 5: Perceive: "What Happened?"
Stage 6: Interpret: "What Does This Mean?"
USER.md): "User works in fintech: security and compliance matter more than speed."Stage 7: Compare: "Did I Achieve My Goal?"
SOUL.md): "Did I do the actual work, or did I just generate boilerplate?": Yes, I found a specific, actionable issue.USER.md): "Is this concise and in plain language?": Let me format with a clear summary line, then bullets, then detail blocks.AGENTS.md): "Did I stay within my rules?": Yes, I read and analyzed but didn't modify anything.Proprioception Loss: What Happens Without Each Layer
Designing Proprioceptive Systems

Principle 1: Proprioception Must Be Loaded Before Action
Principle 2: Different Layers Update at Different Rates
Principle 3: Proprioception Should Be Testable
Principle 4: Proprioception Failure Should Be Detectable
Principle 5: The Stack Is the Agent's Umwelt
From Proprioception to Coordination
The Living Document Problem

Principles for Proprioceptive Design
Conclusion
.md files, the tool schemas, the memory systems, the environment context: have been treated as configuration. A setup step. Something you do once and forget about.References