The Connectionist Paradigm and the Question of Definition

This text was originally written in French and has been translated by AI. Original text: an adaptation of a mini-dissertation submitted to Pierre Wagner (Paris I) in December 2025, as part of a philosophy of logic course, which received a mark of 18/20.

Introduction

Contemporary artificial intelligence, and particularly connectionist machine learning, appears to operate according to principles radically different from those that govern the traditional practice of definition. Where the logician constructs explicit definitions respecting the strict criteria of eliminability and non-creativity, the neural network learns distributed vector representations whose epistemological status remains uncertain. This situation raises a problem: how does the vector representation of concepts in connectionist models relate to what a definition is in the classical sense of the term?

The question is not simply technical. It engages our very understanding of what it means to define, to circumscribe a concept, to trace the boundaries of meaning. The classical theory of definition [Wagner, 2017] rigorously establishes the requirements that any legitimate definition must satisfy: eliminability of the defined term, non-creativity or conservativity, univocity of the definiendum. These criteria, forged in the logical and philosophical tradition, from Pascal’s De l’esprit géométrique [Pascal, 1657] to Frege [Frege, 1884] and Russell [Russell, 1905], constitute our obligatory starting point.

Yet the connectionist paradigm, founded on Firth’s distributional hypothesis according to which “you shall know a word by the company it keeps” [Firth, 1957], proposes a mode of access to meaning that seems to bypass explicit definition. A word is no longer defined by a substitutable definiens, but represented by a vector in a high-dimensional space, whose meaning emerges from its relative position with respect to other vectors. Are we still in the order of definition?

This mini-dissertation proposes to examine this tension through three moments. We shall first establish what a definition is in the classical sense, relying principally on the theory of definition [Wagner, 2017]. We shall then interrogate the status of vector representations with regard to classical criteria. Finally, we shall seek to determine whether the connectionist paradigm invalidates the classical theory or on the contrary reveals an implicit and enlarged form of it.¹

The Classical Edifice of Definition

The Functions of Definition

Before examining the formal rules of definition, it is fitting to recall its functional diversity. Classical theory distinguishes several definitional modalities that do not pursue the same objectives [Wagner, 2017].

The distinction between nominal definition and real definition, inherited from mediaeval scholasticism and taken up by Pascal, opposes two aims. Nominal definition (definitio nominis) is content to explain the use of a word or to introduce a convenient abbreviation. Pascal thus affirms that “definitions are made only to designate the things that are named, and not to show their nature” [Wagner, 2017]. Conversely, real definition (definitio rei), in the Aristotelian tradition, claims to grasp the essence of the thing, to say what it is in itself.

A second distinction separates stipulative definitions from descriptive definitions. The former introduce a new meaning by arbitrary decision (“let $x$ be the smallest prime number”), whilst the latter report the established usage within a linguistic community, as the lexicographer does [Wagner, 2017].

Finally, and this is perhaps the most pertinent distinction for our purpose, one opposes explicit definitions to implicit definitions. An explicit definition permits the direct substitution of the definiendum by the definiens. An implicit definition, in the sense of Gergonne and Hilbert, defines primitive terms not in isolation, but by their mutual relations within a system of axioms [Wagner, 2017]. The axioms of Hilbertian geometry, for example, implicitly define “point”, “line” and “plane” by the structural relations they impose on these terms.

Eliminability of the Defined Term

The first criterion of classical theory is that of eliminability. A definition is essentially an abbreviation. If a term $T$ is defined by an expression $P$, then in any proposition where $T$ appears, it must be possible to replace $T$ by $P$ without altering the truth value of the proposition [Wagner, 2017].

Formally, if we introduce a new symbol $s$ in a language $\mathcal{L}$ via a definition $\delta$, the extended language $\mathcal{L}’$ must contain no semantic content irreducible to $\mathcal{L}$. The definition “1 = S(0)” (where $S$ is the successor function) enables us to systematically eliminate the symbol “1” from any arithmetic formula.

This requirement of eliminability guarantees that a definition is not an ontological enrichment of the language, but simply an instrument of concision. As Arnauld and Nicole note, the care devoted to definitions enables us to “abbreviate discourse”, but on condition that one can always “mentally substitute the definition in place of the defined term” [Wagner, 2017].

However, eliminability is not always a simple term-by-term substitution. In the case of contextual definitions, such as that of logical conjunction $\land$ defined from negation $\neg$ and the conditional $\to$, one cannot directly substitute $\land$ by a fixed expression; one must replace the entire context $(A \land B)$ by $\neg(A \to \neg B)$ [Wagner, 2017].

Moreover, eliminability is only valid in certain contexts. The word “prime” is eliminable in favour of “divisible by exactly two numbers” in the context “13 is…”, but not in the context “Léa is unaware that 13 is…”, where the principle of substitutability of identicals does not apply [Wagner, 2017].

Non-Creativity and Conservativity

The second criterion is that of non-creativity or conservativity. A purely definitional definition must not enable the demonstration of anything whatsoever about primitive terms that was not already demonstrable before its introduction [Wagner, 2017].

This requirement can be formalised thus: let $\mathcal{L}$ be a language in which a theory $T$ is formulated, and let $\mathcal{L}’$ be the language obtained by adjoining a word $m$ defined by $\delta$. One says that $\delta$ satisfies the condition of conservativity relative to $T$ if, and only if, any statement of $\mathcal{L}$ demonstrable from $T \cup {\delta}$ is also demonstrable from $T$ alone [Wagner, 2017].

In other words, if $T$ represents our knowledge and $\delta$ a definition, then $\delta$ must not enrich our stock of knowledge expressible in the original language. A definition that violated this principle would not be purely conceptual; it would convey epistemic content.

A classical example illustrates this well. The pseudo-definition “Alpha is the largest prime number” is creative, for it contradicts Euclid’s theorem on the infinitude of prime numbers [Wagner, 2017]. More dramatic still, the Russellian definition of the set $E$ of all sets that do not belong to themselves directly engenders the contradiction $E \in E \leftrightarrow E \notin E$ [Wagner, 2017].

Implicit Definition in the Sense of Gergonne and Hilbert

Classical theory devotes an important analysis to implicit definition, which it carefully distinguishes from explicit definition [Wagner, 2017]. In Hilbertian geometry, the primitive terms “point”, “line”, “plane” are not defined in isolation by explicit definiens. They are defined globally by the set of axioms that constrain their possible interpretations.

This relational and holistic conception of definition renounces the ideal of an intrinsic meaning, given outside of any structure. The “sense” of a point is defined exclusively by its behaviour within the axiomatic system. It has no essence outside its relations.

The Connectionist Paradigm as Challenge to Classical Theory

The Distributional Hypothesis

Contemporary natural language processing rests on a principle that seems foreign to the logic of explicit definition: the distributional hypothesis, formulated by the linguist John Rupert Firth in 1957: “You shall know a word by the company it keeps” [Firth, 1957].²

This maxim radically displaces the foundation of meaning. In classical logic, identity is absolute: $A = A$. In distributional semantics, identity becomes similarity: $A \approx B$ if and only if $\text{Context}(A) \approx \text{Context}(B)$. Meaning is no longer an intrinsic property, an essence to be captured by a definiens, but a function of the contextual distribution of the word.

The technical realisation of this hypothesis is word embedding, where each word is projected towards a vector in a high-dimensional space $\mathbb{R}^d$ [Mikolov et al., 2013; Pennington et al., 2014]. The “definition” of the word “cat” becomes a vector of 512 real numbers. This vector encodes the statistical co-occurrences of the word in a vast corpus.

Embedding as Mathematised Structuralism

What is an embedding vector as definition? The answer engages an ontology of meaning radically different from that which underlies classical definition.

In the Aristotelian and logicist tradition, to define a term consists in exhibiting its intrinsic content: “a triangle is a closed figure with three sides”. The definiens enunciates essential properties that the defined object possesses, independently of any relation to other objects. Meaning is conceived as a semantic substance attached to the sign.

The embedding vector breaks with this substantialist metaphysics. The vector $\vec{v}_{\text{cat}} \in \mathbb{R}^{512}$ bears in itself no intrinsic semantic property. Considered in isolation, this tuple of 512 real numbers is devoid of signification. It becomes bearer of meaning only by its relative position in vector space—that is to say by its distances from other vectors.

This conception mathematically realises the structuralist intuition of Ferdinand de Saussure, for whom “in language, there are only differences without positive terms”. The meaning of a word does not belong to it as property; it emerges from the system of differences that structures language as totality. Vector embedding gives a precise geometric form to this idea: the meaning of “cat” is its metric difference from “dog” (small distance), “table” (medium distance), “idea” (large distance), and all the other terms of the vocabulary.

One can therefore say that embedding is a purely oppositional definition: “cat” is defined by being neither “dog”, nor “house”, nor “run”, according to a system of Euclidean distances in $\mathbb{R}^d$. Where classical definition responds to the question “what is $X$?” by a positive formula (“$X$ is $P$”), vectorial definition responds by a negative localisation: “$X$ occupies this position in the network of differences”.

This ontology has an immediate consequence: it invalidates the classical requirement of univocity. Pierre Wagner insists on the fact that a definiendum must have a fixed and unique sense. Yet an embedding vector has no stable identity outside the system that produces it. Whether one changes the training corpus, the network architecture or the hyperparameters, and “cat” will receive a numerically distinct vector. Meaning is not an immutable essence, but an unstable and revisable configuration.

Geometry and Invariance: Identity up to Rotation

The Saussurian arbitrariness of the sign finds its mathematical echo in the arbitrariness of vector bases. A neural network can be subjected to a global rotation without its functional properties changing. What matters is not the absolute coordinates of neurons individually ($x_i$), but the geometric relations that unite them.

We can formalise this intuition by analysing cost functions and similarity metrics. Model learning is steered by the minimisation of cross-entropy between the predicted distribution $q$ and the target distribution $p$.

\[\begin{align} H(p, q) &= -\sum_{x \in \mathcal{X}} p(x) \log q(x) \\ &= -\mathbb{E}_{x \sim p} [\log q(x)] \end{align}\]

This quantity forces internal representations to organise themselves so as to maximise the linear separability of classes or the probability of the next token.

But how to compare these representations? If one trains two identical models with different random initialisations, the vectors for “cat” will be numerically very different. Yet they capture the same “definition”. We therefore need a similarity measure that is invariant under rotation and orthogonal permutation.

The classical Euclidean distance $|\vec{u} - \vec{v}|$ is sensitive to norm. Computer scientists in natural language processing often prefer cosine similarity, which only considers the angle:

\[\begin{align} \text{sim}(\vec{u}, \vec{v}) &= \frac{\vec{u} \cdot \vec{v}}{\|\vec{u}\| \|\vec{v}\|} \\ &= \cos(\theta) \end{align}\]

However, to compare entire representation spaces (for example, layer 3 of model A and layer 3 of model B), the most robust measure is Centred Kernel Alignment (CKA). It enables quantification of the similarity between two representation matrices $X \in \mathbb{R}^{n \times d_1}$ and $Y \in \mathbb{R}^{n \times d_2}$ in a manner invariant to orthogonal transformations:

\[\begin{align} \text{CKA}(K, L) &= \frac{\text{HSIC}(K, L)}{\sqrt{\text{HSIC}(K, K)\text{HSIC}(L, L)}} \end{align}\]

where $K = XX^T$ and $L = YY^T$ are the similarity (kernel) matrices between examples. If $X’ = XQ$ with $Q$ an orthogonal matrix, then $K’ = (XQ)(XQ)^T = XQQ^TX^T = XX^T = K$. Invariance is thus mathematically guaranteed.

This mathematical invariance is not a mere technical detail. It reveals a strong ontological thesis: the “definition” of a concept in a neural network is not the particular vector (which depends on random initialisation), but the geometric structure invariant under isometric transformations. This is a form of structural realism: what is real is not the substance (numerical coordinates), but the form (metric relations). This position recalls Poincaré and Maxwell, for whom what survives changes of representation constitutes the objective content of scientific knowledge. The true “definition” is therefore the relative geometry of the point cloud, not its contingent coordinates.

Definition as Attention Calculation

A major limitation of early vector models (Word2Vec type) resided in the static character of definition: a word received a unique vector, independent of its varied senses. The Transformer architecture [Vaswani et al., 2017] resolves this problem by rendering definition dynamic thanks to the attention mechanism.

In this formalism, the meaning of a token is not given, it is calculated. For a token at position $i$, one projects its initial vector $x_i$ towards three distinct spaces: a query $q_i$, a key $k_i$, and a value $v_i$, by multiplication with learnt matrices $W^Q, W^K, W^V$.

The semantic contribution of each context word $j$ to the target word $i$ is determined by an attention score $A_{i,j}$, calculated as the compatibility between the query of one and the key of the other:

\[A_{i,j} = \text{Softmax}_j \left( \frac{q_i \cdot k_j^T}{\sqrt{d_k}} \right)\]

The final representation $z_i$ of the word is then the weighted sum of the values of all context words:

\[z_i = \sum_{j} A_{i,j} v_j\]

This equation gives a precise mathematical form to the idea of contextual definition. The “meaning” $z_i$ is literally the sum of traces left by other words ($v_j$), weighted by their contextual relevance ($A_{i,j}$).

Contextual Polysemy as Refutation of Univocity

Classical theory of definition confronts a major empirical obstacle: contextual polysemy. A same word often possesses several related but distinct senses. Classical definition treats this phenomenon as an ambiguity to be resolved: “bank$_1$” = financial institution, “bank$_2$” = edge of a watercourse. Context then plays the role of selector between these pre-established significations.

But this conception proves inadequate in the face of utterances such as “I slept on the bank”. Which sense of “bank” is selected here? Neither the financial institution nor the river bank is suitable—unless one supposes a third sense, “bank$_3$” = piece of furniture. Yet this ad hoc multiplication of senses reveals the failure of the strategy: rather than a unique definition with pre-established senses, we have a potential infinity of micro-senses adjusted to context.

The problem is deeper still. Consider “the temperature is rising”. Is it a thermometer, fever, social tension? Classical definition would require stipulating in advance all possible senses of “rising”, or introducing a definiens sufficiently abstract to cover them all (“to increase according to a scalar dimension”). But this latter option amounts to admitting that the concrete sense of “rising” in each context cannot be deduced from the general definition—it must be constructed by contextual interpretation.

It is here that the attention mechanism of transformers reveals its philosophical importance. Let us recall the equation:

\[z_i = \sum_{j} A_{i,j} v_j\]

This formula shows that the final representation $z_i$ of word $i$ is not selected from amongst a stock of pre-established senses, but calculated dynamically by weighted aggregation of the contributions of all contextual words $v_j$. In “I slept on the bank”, the final vector of “bank” incorporates semantic traces of “slept” and “on”, producing a representation sui generis that did not exist before the calculation.

Meaning is therefore not consultation of a mental dictionary, but computation. This idea finds an echo in ordinary language philosophy. Wittgenstein, in the Philosophical Investigations, affirms: “Don’t ask for the meaning, ask for the use” (§43). For him, the meaning of a word is not a fixed entity associated with the word, but its manner of being employed in language games.

The attention mechanism is a mathematical formalisation of this Wittgensteinian intuition: the “meaning” of “bank” in a given sentence is literally its use in that particular syntactic and semantic configuration, encoded as resulting vector.³ Definition ceases to be an a priori stock of significations, to become an in situ process of construction of meaning.

This invalidates the central presupposition of classical theory: that to define consists in fixing meaning once and for all. The connectionist paradigm shows on the contrary that meaning is constitutively variable, contextual and emergent.

Eliminability Violated

Let us confront this practice with the classical criterion of eliminability. Can one eliminate an embedding vector in favour of a linguistic expression that would be synonymous with it?

The answer is manifestly negative. The vector $\vec{v}_{\text{king}}$ is not a string of language symbols. It is a mathematical entity of a radically different ontological nature—a point in $\mathbb{R}^{512}$. One cannot “mentally substitute” this vector for the word “king” in an ordinary sentence. The translation of the word into vector is not a substitution of equivalent symbols, but a projection into a distinct phenomenological space.

Moreover, this projection is “lossy”—it loses information, or at least irreversibly transforms the nature of information. The vector captures a statistical distribution, not a compositional meaning in the traditional sense. There is no strict equivalence in the classical sense, but a probabilistic correspondence. The question of compositionality in language models, notably Transformer architectures [Vaswani et al., 2017] and contextual models such as BERT [Devlin et al., 2019], is the subject of active research [Pommeret et al., 2025].

Apparent Creativity or Structural Discovery?

The second classical criterion—non-creativity—seems flagrantly violated by connectionist models. Vector relations permit famous inferences of the type:

\[\vec{v}_{\text{King}} - \vec{v}_{\text{Man}} + \vec{v}_{\text{Woman}} \approx \vec{v}_{\text{Queen}}\]

This “vector arithmetic” produces knowledge that was not explicitly encoded in the individual definitions of terms. If $T$ designates our theory of language and $\delta$ the vector representation of “king”, then $T \cup {\delta}$ permits inferring semantic relations (king:man::queen:woman) that were not demonstrable in $T$. In this sense, vector definition is creative: it enriches our stock of knowledge.

But is this “creativity” real or apparent? The question merits careful examination.

On one side, one can maintain that the inference “king - man + woman = queen” creates no new knowledge, but reveals a latent semantic structure already present in linguistic usage. This relation of analogy between royalty and gender was implicitly contained in the manner in which speakers employ these words. The vector model does no more than render explicit what was already implicitly knowable—exactly as the geometer who demonstrates a theorem renders explicit what was implicit in the axioms.

In this perspective, the criterion of non-creativity is not violated. If one defines $T$ not as a finite set of explicit propositions, but as the set of all semantic truths implicit in the linguistic usage of a community, then the relation “king:man::queen:woman” was already in $T$. The connectionist model adds nothing; it extracts.

However, this response raises an important objection. Chomsky and the symbolists would object that statistical regularities of surface do not capture the deep structure of language—the universal, innate syntactic rules that generate linguistic competence. Vector arithmetic “king - man + woman = queen” would then be a superficial artefact, not an authentic structural discovery.

But one can turn the argument round: what if Chomskyan “deep structure” were itself merely an artefact of our rationalist obsession with explicit rules? What if real linguistic competence were statistical, distributed, emergent? The two positions reveal different levels of analysis: connectionism models effective performance, symbolism aims at idealised competence. Neither exhausts the phenomenon.

On the other side, one can object that this geometric structure is an artefact of the learning process, without real counterpart in human linguistic competence. No naïve speaker mentally manipulates vectors of 512 dimensions nor performs vector subtractions. The analogical relations captured by embeddings are perhaps statistical regularisations produced by optimisation, rather than authentic cognitive structures.

In this sceptical reading, the model is indeed creative: it introduces new conceptual relations that do not belong to ordinary semantics, but to an artificial geometry projected onto linguistic data. The “discovery” that king - man + woman = queen is then a property of the model, not of natural language.

This debate reveals an ambiguity: what exactly does a vector embedding define? Does it define the word as it is used by speakers (semantic realism), or as it is modelled by a particular architecture trained on a particular corpus (instrumentalism)?

Philosophy of language offers a useful distinction here. Putnam, in “The Meaning of ‘Meaning’” [Putnam, 1975], distinguishes the intension of a term (its sense, its definition) and its extension (the set of objects to which it refers in effective usages).⁴ Classical definitions aim to capture intension; embeddings capture extensional regularities.

An embedding vector does not encode “what it means to be a king”, but “how the word ‘king’ behaves in observed contexts”. It does not define the concept, but models its distribution. In this sense, embeddings do not so much violate the criterion of non-creativity as they change domain: they leave the plane of conceptual definition for that of statistical modelling.

Vector Implicit Definition: An Analogy with Hilbert

Faced with the apparent failure of classical criteria (eliminability, non-creativity), must one conclude that vector representations are not definitions? This would be hasty. They could constitute a particular form of implicit definition, in the enlarged sense that classical theory already recognises as legitimate.

The analogy with Hilbertian geometry is instructive. In Hilbert’s axiomatic system, the primitive terms “point”, “line”, “plane” receive no explicit definition. They are defined globally by the set of axioms that constrain their possible interpretations. The “sense” of a point is nothing other than the structural role it plays in axiomatic relations.

In an analogous manner, in a connectionist model, the meaning of a token is not given by a linguistic definiens, but by its interaction with all other tokens in the space structured by the network weights. Weights play the role of axioms: they implicitly define admissible relations between terms.

Let us construct this analogy more rigorously:

Hilbertian implicit definition	Connectionist vector definition
Undefined primitive terms: point, line, plane	Tokens: “cat”, “dog”, “king”
Axioms: logical propositions constraining relations	Network weights: parameters constraining vector distances
Meaning = satisfying axioms in a model	Meaning = position in embedding space
Structural uniqueness if models isomorphic	Structural plurality: embeddings depend on corpus

However, the analogy also reveals a crucial difference, epistemological in nature.

In Hilbert, axioms are a priori propositions whose status is (at least in the logicist interpretation) that of analytic truths or necessary stipulations. If “between” satisfies the axioms of Euclidean geometry, it is because it possesses necessarily Euclidean structure. Axioms constrain a priori the space of possible interpretations.

By contrast, in the connectionist paradigm, network weights are parameters learnt empirically by gradient descent on a corpus. They encode statistical regularities observed in that particular corpus. If “cat” possesses such-and-such vector, this is not a necessary truth about the concept of cat, but a contingent fact about the distribution of the word in Wikipedia, or in Common Crawl, or in such-and-such other training corpus.

Hilbertian implicit definition is therefore a priori, where vector definition is a posteriori. This distinction has major consequences. Firstly, Hilbertian axioms claim universality, whereas embeddings are always relative to a corpus and a specific architecture. Secondly, where axioms impose necessary constraints, neural weights only capture probable tendencies. Finally, if Hilbertian definition aspires to logical rigour, vector definition assumes a form of statistical fluidity.

This difference recalls the Quinean critique of the analytic/synthetic distinction. In “Two Dogmas of Empiricism” [Quine, 1951], Quine contests the idea that there exists a clear frontier between definitional truths (analytic) and factual truths (synthetic). All knowledge forms a holistic network where “definitions” themselves can be revised in the light of experience.

The connectionist paradigm proves Quine right: the “definition” of a word is not a fixed analytic kernel, but a revisable inductive structure, anchored in observed linguistic usage. Embeddings are therefore not definitions in the classical sense, but empirical models of meaning.

The Limits of Vector Opacity

Definitional Transparency versus Computational Opacity

If vector representations can be understood as enlarged forms of definition, they are nevertheless distinguished from classical definitions by a crucial epistemic property: their opacity.

A classical definition possesses a virtue of transparency: it explicitly exposes the conditions of application of the concept. The definition “a prime number is a natural number divisible by exactly two distinct numbers” enables me not only to identify prime numbers, but above all to understand why 13 is prime and 14 is not. The definition provides the reasons for classification.

An embedding vector, conversely, is opaque. The vector $\vec{v}_{\text{prime}} \in \mathbb{R}^{512}$ can enable me to predict with precision whether a number is prime (if the model has been trained on this task), but it explains nothing to me. I cannot inspect this vector to read in it the criteria that make a number prime. Vector representation encapsulates statistical regularities without articulating them in the form of intelligible criteria.

This difference is not simply pragmatic, it is epistemological. Classical definition satisfies an ideal of knowability: to know what $X$ is, is to be able to explicitly enunciate the conditions that make an object fall under $X$. Embedding vector satisfies an ideal of predictability: to know how to treat $X$, is to be able to calculate correct responses involving $X$.

One recognises here the opposition between two conceptions of knowledge: the rationalist model (to know is to possess articulable reasons) and the connectionist model (to know is to have processing capacities).

The Problem of Hallucination

This opacity has important practical consequences. Generative language models, trained on vector representations, frequently produce plausible but false affirmations—the phenomenon called “hallucination”. The model generates “The Eiffel Tower was inaugurated in 1887” (false: 1889) because this proposition is statistically coherent with learnt patterns, even if it is factually incorrect.

This problem reveals a structural limitation of vector representations: they capture verisimilitude (what resembles training data) rather than truth (what corresponds to facts). A vector has no intrinsic criterion of factual correctness; it only encodes conditional probabilities.

Faced with this challenge, several works in artificial intelligence have attempted to reintroduce discrete and verifiable structures. The so-called “retrieval-augmented generation” (RAG) approach is emblematic. It rests on the decomposition of text into atomic propositions—statements that contain exactly one distinct, autonomous and minimal fact [Pommeret et al., 2024].

For example, the sentence “The dog and the cat are in the kitchen” would be decomposed into two atomic propositions: “The dog is in the kitchen” and “The cat is in the kitchen”.

Each proposition can then be verified independently against a reliable knowledge base. The model’s response is validated or corrected according to the factual correspondence of its atomic propositions.

This approach explicitly reintroduces the logic of logical atomism, defended by Russell and the early Wittgenstein [Wittgenstein, 1921]. According to this doctrine, the world is constituted of independent atomic facts, and a true proposition corresponds to an atomic fact. By imposing this atomic structure on the material treated by the vector model, one forces the system to respect the logical boundaries of facts.

A Dialectic between Continuity and Discretion

One observes here a dialectical tension. The connectionist model is powerful precisely because it operates in the continuous and distributed space of vectors. This continuity enables it to generalise, to interpolate, to capture fine semantic nuances. But this power is paid for by a loss of logical control.

The reintroduction of atomic propositions seeks to restore factual precision by sacrificing a part of vector fluidity. It is a partial return to the classical paradigm: discrete units, composable, verifiable.

However, this return is never total. Atomic propositions themselves are identified and treated by connectionist models. Factual verification still rests on embeddings to measure semantic similarity between generated proposition and knowledge base. We therefore have not an abandonment of the vector paradigm, but a hybridisation: discrete logical structures anchored in a continuous vector substrate.

This hybridisation suggests that neither of the two paradigms—the logical-symbolic and the connectionist—can claim exclusivity. Classical explicit definitions guarantee rigour, eliminability, epistemic transparency. But they are fragile, rigid, and struggle to capture the contextual richness of natural language.

Vector representations, conversely, are robust, flexible, and capture distributional nuances remarkably well. But they violate classical criteria, produce inferences whose logical validity is not guaranteed, and remain opaque.

The future could therefore reside in neuro-symbolic architectures, where vectors manage fuzzy and intuitive correspondence, whilst discrete logical structures (atomic propositions, knowledge graphs, symbolic rules) manage factual precision and controllability. The definition of the future would then be a hybrid object: a vector embedding anchored to a discrete logical graph.

Conclusion: Towards a Pluralist Theory of Definition

The confrontation between classical theory of definition and the connectionist paradigm does not reveal, as one might have feared, a radical incompatibility. It rather reveals that the notion of “definition” is itself broader and more heterogeneous than the logicist tradition suggested.

Classical theory, as formulated by Pierre Wagner from Pascal, Frege and Russell, has rigorously established the normative requirements of explicit definition in formal systems: eliminability of the defined by the defining, non-creativity or conservativity relative to a theory, univocity of the definiendum. These criteria remain valid and necessary for stipulative definitions in logic and mathematics.

But classical theory itself recognises the existence of alternative forms: implicit definition à la Hilbert, where terms are defined by their mutual relations in a system of axioms; recursive definition, where the defined appears in the defining under certain conditions; definition by abstraction, which introduces new objects by quotienting an equivalence relation.

The connectionist paradigm extends this plurality by introducing a new form: the statistical distributional definition. In this mode, the meaning of a term is neither a substitutable formula (explicit definition), nor a structural role in a system of axioms (Hilbertian implicit definition), but a geometric localisation in a space of learnt co-occurrences.

This definition possesses distinctive properties that singularise it radically. It is first relational, the meaning of a word being only its relative position to all others. It is then contextual and continuous, varying according to syntactic and semantic environment and expressing itself in a metric rather than discrete space. Finally, it is a posteriori and opaque, emerging from observed regularities without offering articulable explanation.

These properties violate the classical criteria of eliminability and non-creativity. But they do not therefore constitute an abandonment of the notion of definition. They rather reveal that classical theory described a particular case (formal explicit definition) whilst taking it for the general form.

We can therefore propose a typology of definitional modes that distinguishes four approaches. Explicit stipulative definition (Pascal, Frege), characterised by strict eliminability and epistemic transparency, remains the ideal for formal systems. Structural implicit definition (Hilbert), which defines meaning by axiomatic relations, suits mathematical theories. Statistical distributional definition (Firth, connectionism), founded on co-occurrences and inductive generalisation, proves powerful for natural language processing despite its opacity. Finally, ostensive definition (Wittgenstein), which anchors meaning in usage and direct learning, remains fundamental for understanding ordinary language acquisition.

These modes are not mutually exclusive. They correspond to different cognitive and scientific practices, each having its domain of validity.

The connectionist paradigm therefore does not destroy classical theory: it reveals that it described a particular case whilst taking it for the general form. Vector embedding is indeed a form of definition—not in the sense of substitutable identity, but in the sense of structural localisation of a concept in a space of significations.

What connectionism teaches us is that to define is not only to analyse and decompose (classical ideal), but also to situate and relate (structural ideal). Definition is not uniquely an act of fixing meaning, but also a process of contextual construction.

There remains open a vertiginous meta-theoretical question: if the notion of “definition” itself covers heterogeneous practices (stipulative, implicit, distributional, ostensive), should we not have a meta-definition that unifies these modes? Or must we accept that “definition” is itself a polysemous concept, whose meaning varies with theoretical context?

That would be a profound irony: confrontation with the connectionist paradigm constrains us to admit that the very concept of “definition” resists a univocal and explicit definition. Thereby validating, in a certain manner, the central teaching of connectionism: meaning emerges from the network of usages, not from a pre-established essence.

Appendix: Gradient Descent

Gradient descent is the optimisation algorithm that enables neural networks to learn from data. Its principle rests on a simple geometric idea: to minimise a function, it suffices to move in the direction of its steepest decrease.

Mathematical Principle

Consider a model parameterised by a weight vector $\theta \in \mathbb{R}^n$ (where $n$ can reach several billion in contemporary models). The objective of learning is to minimise a cost function $\mathcal{L}(\theta)$ that measures the gap between model predictions and observed data.

The gradient of this function, noted $\nabla_\theta \mathcal{L}(\theta)$, is a vector that points in the direction of the steepest growth of $\mathcal{L}$. To minimise $\mathcal{L}$, one therefore takes steps in the opposite direction:

\[\theta_{t+1} = \theta_t - \eta \nabla_\theta \mathcal{L}(\theta_t)\]

where $\eta > 0$ is the learning rate, a hyperparameter that controls the amplitude of each step.

Geometric Interpretation

Let us imagine the cost function as a surface in a high-dimensional space. The gradient indicates the local “slope” at each point. The algorithm proceeds by successive descents: starting from a random initialisation $\theta_0$, it iteratively adjusts parameters by “descending the slope” until reaching (ideally) a minimum.

Stochastic Gradient Descent

In practice, calculating the exact gradient on the entire training corpus is prohibitive. One therefore uses stochastic gradient descent (SGD): at each iteration, one estimates the gradient on a small random subset of data (a mini-batch). Although this estimation is noisy, it permits efficient learning on vast corpora.

Epistemological Consequence

Gradient descent reveals the inductive and a posteriori character of connectionist learning. This a priori/a posteriori opposition reactivates a classical philosophical debate: are concepts innate (Cartesian, Leibnizian rationalism) or acquired by experience (Lockean, Humean empiricism)? The connectionist paradigm proves the empiricists right: meaning is not stipulated by pure reason, but extracted from regularities observed in linguistic usage.

Unlike Hilbertian axioms, which fix a priori geometric structure as necessary truths or analytic stipulations, neural weights emerge from an iterative process of empirical adjustment on a corpus. The “sense” of a word in a connectionist model is therefore not stipulated as immutable essence, but learnt inductively by statistical optimisation. It is linguistic experience—the observed distribution of words in their contexts—that determines representation, not a prior rational intuition.

References

This reflection draws on research work conducted at the Institut de Recherche en Informatique Fondamentale (IRIF) and the Laboratoire Interdisciplinaire des Sciences du Numérique (LISN) [Pommeret et al., 2024], notably concerning compositionality in transformers [Pommeret et al., 2025] and the evaluation of atomic propositions for fact-checking [Pommeret et al., 2024].

[Devlin et al., 2019] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL.
[Distrub. Hypo., 2023] A Review of Distributional Hypothesis. ACL.
[Firth, 1957] Firth, J. R. (1957). A synopsis of linguistic theory 1930–1955.
[Frege, 1884] Frege, G. (1884). Die Grundlagen der Arithmetik.
[Mikolov et al., 2013] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. ICLR.
[Pascal, 1657] Pascal, B. (1657). De l’esprit géométrique.
[Pennington et al., 2014] Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global Vectors for Word Representation. EMNLP.
[Pommeret et al., 2024] Pommeret, L. (2024). Rapport de recherche IRIF.
[Pommeret et al., 2024] Pommeret, L., Lassoued, A., & de Rougemont, M. (2025). Composition with Transformers.
[Pommeret et al., 2024] Pommeret, L., Rosset, S., Servan, C., & Ghannay, S. (2024). AtomicEval: Evaluation Framework for Atomic Proposition Autonomy with French Propositioner. JDSE.
[Putnam, 1975] Putnam, H. (1975). The Meaning of ‘Meaning’.
[Quine, 1951] Quine, W. V. O. (1951). Two Dogmas of Empiricism.
[Russell, 1905] Russell, B. (1905). On Denoting.
[Vaswani et al., 2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. NeurIPS.
[Wagner, 2017] Wagner, P. (2017). La définition. https://shs.hal.science/halshs-01494741
[Wittgenstein, 1921] Wittgenstein, L. (1921). Tractatus Logico-Philosophicus.

This reflection draws on research work conducted at the Institut de Recherche en Informatique Fondamentale (IRIF) and the Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), notably concerning compositionality in transformers and the evaluation of atomic propositions for fact-checking. ↩
For a modern pedagogical introduction, see [Distrub. Hypo., 2023]. ↩
Ludwig Wittgenstein, Philosophical Investigations, §43: “the meaning of a word is its use in the language”. Basil Blackwell edition, translated by G. E. M. Anscombe. ↩
This distinction goes back to the Logic or the Art of Thinking (1662) by Antoine Arnauld and Pierre Nicole, called the Port-Royal Logic, which opposed “comprehension” (set of attributes contained in the idea) to “extension” (set of objects to which the idea applies). ↩