cosmordinary

In this infinite cosmos we have only one life each. A life in this infinite cosmos is just like a dust in the world. Born by chance and die in necessity. Feeble life and boundless world, two facts we face. Even if you spend your whole life solving them. You will get nothing. - 1998

Sunday, April 25, 2004

game theory

summary of some basic concepts of game theory after some reading:

utility
denotes a measure of subjective psychological fulfillment

economically_rational
That is, a player can (i) assess outcomes; (ii) calculate paths to outcomes; and (iii) choose actions that yield their most-preferred outcomes, given the actions of the other players

imperfect_information
all simultaneous-move games are games of imperfect information. Games of perfect information (as the name implies) denote cases where no moves are simultaneous (and where no player ever forgets what has gone before).

strategic_games
Matrix games are referred to as ‘normal-form’ or ‘strategic-form’ games, and games as trees are referred to as ‘extensive-form’ games. When order of play is irrelevant to a game's outcome, then you should study its strategic form, since it's the whole set you want to know about. Where order of play is relevant, the extensive form must be specified or your conclusions will be unreliable.


NE
A set of strategies is a NE just in case no player could improve her payoff, given the strategies of all other players in the game, by changing her strategy. Notice how closely this idea is related to the idea of strict dominance: no strategy could be a NE strategy if it is strictly dominated. This implies that if a game has an outcome that is a unique NE, as in the case of joint confession in the PD, that must be its unique solution. This is one of the most important respects in which the PD is an ‘easy’ (and atypical) game.

Production-Possibility Frontier
Production efficiency occurs when production of one good cannot be increased without curtailing producton of another good. This is illustrated by PPF. When an economy is operating efficiently -on its PPF- it can produce more of one good only by producing less of another good. PPF illustrate many basic economic processes: how economic growth pushed out the frontier, how a nation chooses raltively less food and other necessities as it develops, how a country chooses between private goods and public goods, and how societies choose between consumption goods and capital goods that enhance future consumption. Societiesare sometimes inside their PPF frontier. When unemployment is high or when revolutionor inefficient government regulations hamper economic activity, the economy is inefficient and operates inside PPF.

Friday, April 23, 2004

thematic/semantic-roles/relations

Semantic correlates of syntactic roles
syntactic: subject, direct object, indirect object, prepositional objects
semantic: agent, patient, benefactive, instrument-loc-source

Roles Suggested in Linguistc Work
Fillmore's (1968 ) 5 roles
agentive: typically animate perceived instigator
instrumental: inanimate force or object causally involved
dative: animate being resulting from the action
locative: location or spatial orientation of the action
objective: other entities invovled in the action

Frawley's (1992) 4 roles
logical actors: agent, author, and instrument
logical recipients: patient, experiencer, and benefactive
spatial role: theme, source, and goal
non-participant roles: locative, reason, and purpose

Jackendoff's (1972) 3 primitive roles:
CAUSE, CHANGE and BE later adopted by Dowty (1979) in Montague Grammar.

Dowty's (1989) 2 `thematic-role-like concepts' for verbal predicates:
the proto-agent and proto-patient role.
subjecthood vs. objecthood, agentivity vs. patientivity
Contributing properties for the proto-agent role
-- Volition; sentience (and/or perception); causes event; movement.
Contributing properties for the proto-patient role
-- Change of state (including coming-to-being, going-out-of-being); incremental theme (i.e. determinant of aspect); causally

affected by event; stationary (relative to movement of proto-agent).

Sowa's (1984) "Conceptual Structures" text
two dozen or so thematic relations and 37 conceptual relations [url]
Sowa's (1999) "Knowledge Representation" text
19 themantic roles [url=http://www.jfsowa.com/ontology/thematic.htm]

Upper Cyc Ontology (1997)
100+ themantic roles [url=http://www.cyc.com]

FrameNetII countless frames
every sense of every word (i.e., every lexical unit) has its own frame. [url=http://www.icsi.berkeley.edu/~framenet]

A list of the most popular semantic/thematic-roles/relations and the properties usually associated with them is given below.
Agent: A participant which the meaning of the verb specifies as doing or causing something, possibly intentionally. Examples:

subjects of kill, eat, hit, smash, kick, watch.

Patient: a participant which the verb characterizes as having something happen to it, and as being affected by what happens to it. Examples: objects of kill, eat, smash but not those of watch, hear, love.

Experiencer: A participant who is characterized as aware of something. Examples: subject of love, object of annoy.

Theme: A participant which is characterized as changing its position or condition, or as being in a state or position. Examples: objects of give, hand, subjects of walk, die.

Location: The thematic role associated with the NP expressing the location in a sentence with a verb of location. Examples: subjects of keep, own, retain, know, locative PPs.

Source: Object from which motion proceeds. Examples: subjects of buy, promise, objects of deprive, free, cure.

Goal: Object to which motion proceeds. Examples: subject of receive, buy, dative objects of tell, give. (adapted from ([Dow89])

Thursday, April 22, 2004

Term Weight

Not every term has the same importance to its document. Terms need to be weighted, be it binary weights , raw term frequency or TFIDF. By TFIDF, weight is determined by the frequency of occurrence of that term in a certain document and inverted occurrence of documents containing that term. It is intuitively plausible. Mathematically:

Weight(word)= TF(Wi, Doc)*IDF(Wi)= TF(Wi,Doc)*log(N/n+1)

Where TF(Wi, Doc) is the frequency of Word Wi in Document Doc. IDF(Wi) is inverted frequency of documents containing word Wi. After normalization, N is the number of all documents, n is the number of documents containing word Wi.

Weight(word) = TF(Wi,Doc)*log(N/n+1)

However, since term-by-document matrices are usually sparse and have high dimensions, conventional vector space models are liable to noise (differences in word usage, terms that do not help distinguish documents, etc.) and are also difficult to capture the underlying semantic structure. Additionally, computing resources necessary for the storage and processing of such data is enormous. Dimensionality reduction is a way to overcome these problems. The method is to map a vector from high dimensional space to a much lower dimensional space vector. Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) are popular techniques for dimensionality reduction based on matrix decomposition. Latent semantic indexing (LSI) uses SVD to reduce vector space.

6. explain LSI
Here we can make an analogy to help us understand the principles of VSM and LSI. Suppose you want to know what on earth Cognitive Science is and more concretely what contributing subjects to cognitive science are. You send questionnaires with 10 subjects on them to people and ask people to rank them from 10 (most important to Cognitive Science) to 1 (least important to Cognitive Science). You note down the rankings of these 3 subjects --- linguistics, psychology and neuroscience. You can graph the results of your survey by setting up a chart with three orthogonal axes, each for one each keyword. To plot a particular “Cognitive Science”, you count the rank of the 3 keywords in each questionnaire, and then take the appropriate number of steps (1-10) along the axis for that questionnaire. Each questionnaire forms a vector in that space, with its direction and magnitude determined by how high the ranks of the three keywords appear in it. When you are finished, you get a cluster of points in this three-dimensional space.

If you draw a line from the origin of the graph to each of these points, you obtain a set of vectors in the “Cognitive Science” space. The size and direction of each vector tells you how high the ranks of the three key items were in any particular questionnaire, and the set of all the vectors taken together tells you something about the kind of Cognitive Science people favour.

We choose 3 key words, so we get a 3-D visualization. However, this space could have any number of dimensions, depending on how many keywords we chose to use. If we were to choose Artificial Intelligence, Anthropology as well, then we have a 5 dimensional term space. In real world this term space usually has many thousands of dimensions. Each document in our collection is a vector with as many components as there are content words. Although we can not visualize them, the basic idea is the same as what we have with the Cognitive Science example. If we use binary term weight, that is whether the term occur in the document or not, intuitively documents in such a space that have many words in common will have vectors that are near to each other, while documents with few shared words will have vectors that are far apart.

[5.part of termpaper submitted 2004.04 Corpus-based Semantics]

Singular Value Decomposition (SVD)

Latent semantic indexing works by projecting this large, multidimensional space down into a smaller number of dimensions. Keywords that are similar in meaning will get squeezed together, and will loose its distinctiveness. This method of data compression allows LSI to go beyond conventional VSMs. This is done with SVD. With SVD, every matrix A could be written as
A=(hanger)*(stretcher)*(aligner)
The formal explanation of SVD is beyond the scope of this paper. Readers are encouraged to read a comprehensive online tutorial of SVD, which starts from matrix action and all the way to noise filtering (http://www.acm.org/sigmm/MM98/electronic_proceedings/huang/node4.html).

The working principle of SVD is to reduce dimensions while preserving the relative distance between document vectors as much as possible. In this dimension reduction, information will be lost, and content words are laid over on one another. Information loss sounds harmful, but here it is a good thing. While losing noise from our original term-document matrix, we also reduce the dimension to a manageable extend. After SVD, similar things become more similar, while dissimilar things remain distinct. Synonymies get superimposed, while polysemic words get separated. This reductive mapping is what gives LSI its seemingly intelligent behaviour of being able to correlate semantically related terms. We are really exploiting a property of natural language, namely that words with similar meaning tend to occur together.

To make things easier to understand, we take a simple example, reducing a 2-D space to a 1-D. This happens to be called “Linear Regression” in statistics. With a set of n points (Xi,Yi), the optimal characterization would be a line f(X)=mx+b which is closest to all points in the set. This line also preserves the relative distance between the points to an optimal extent. When face a 3-D space, you are looking for an optimal mapping of points in this 3-space onto a plane, preserving the relative distance of the points as much as possible.

A typical term space might have tens of thousands of dimensions, and be projected down into fewer than 150. In an n-dimensional space, nevertheless, the principle is exactly the same. When term-document matrix is decomposed into 3 matrixes, the stretcher (the diagonal matrix) describes dimensions of variation. Thus the dimensions are ranked: the first value describes the dimension with the largest variation, the second the dimension with the second largest variation, and so on. Low-rank dimensions are deleted because they describe most similar vectors. After reduction, the original matrix is reconstructed. Here I will not reproduce the old examples, but examples obviously instantiate the whole procedure and help understanding.

The SVD algorithm has a complexity of O(N2*k3), where N is the number of terms plus documents, and k is the number of dimensions in the concept space. Typically, k will be small, ranging anywhere from 50 to 350. However, N grows rapidly as the number of terms and the number of documents increase. This makes the SVD algorithm unfeasible for a large, dynamic collection.

[7. part of term paper submtted 2004.4 Corpus-based Semantics]

Wednesday, April 21, 2004

Contruction Grammar and NeuroTheory of Language

Founder of CG -- Adele E. Goldberg. http://www.linguistics.uiuc.edu/agoldbrg/ (summary on Goldberg 2000 nifty encyclopedia entry, quoted)
- a construction is a unique form-meaning/use pair
"A CONSTRUCTION is defined to be a pairing of form with meaning/use such that some aspect of the form or some aspect of the meaning/use is not strictly predictable from the component parts or from other constructions already established to exist in the language.

-a frame is a construct that links semantics and syntax
"A frame is an intuitive construct that allows us to formalize the links between semantics and syntax in the results of lexical analysis. Semantic frames are schematic representations of situations involving various participants, props, and other conceptual roles, each of which is a frame element. " (framenet website)

-share basic ideas of HPSG,CG,MG
"Construction Grammar shares the basic and fundamental idea that the construction
(or sign) is central to an account of language with several other current theories including
Head Driven Phrase Structure Grammar, cognitive Grammar, and Montague Grammar. "

-declarative, contraint-based, generative, not derivational(mono-stratal)
"CG is declarative, mono-stratal representation, no derivations are posited" Declarative mood is an epistemic mood that signals that the proposition expressed by a speaker's utterance is offered as an unqualified statement of fact.


- e.g. Elena faxed Ken the letter.
"the sentence instantiates the subject-predicate construction, the ditransitive construction, the determiner construction (the letter), the past tense morphological construction (fax-ed) and five simple morphological constructions, corresponding to each word in the sentence:"

- integrated information: CG includes everything!
phonotics + syntax + semantics + pragmatics + dialect variation + discourse
"Construction Grammar does not assume that syntax is generally isolated or isolatable from semantics or conditions of use.Construction Grammar also eschews a strict division between the pragmatic and the semantic. Generalizations about particular arguments being topical, focused, inferable, etc. are also stated as part of the constructional representation. Facts about the use of entire constructions, including register, dialect variation, etc. are stated as part of the construction as well. ... Construction Grammar aims to account for the full range of facts of any language, without assuming that a particular subset of the data is part of a privileged “core”." by what? Speculation?


- contructions are related
relations mentioned by Goldberg "inheritance, sisters", any others? e.g.
constructions of restrictive and non-restrictive relative clauses, topicalization, and wh-questions are inherited from left isolation construction.

- unification based formalism
"Thus each construction is represented by an Attribute-Value Matrix (AVM). Each attribute can have at most one value. Attributes may be n-ary, or may be feature structures themselves."

- data: "corpus, introspection, psycholingistic experiment"

- Argument Structure Constructions:
constructions that encode the form and function of simple basic clause types in a language such as the transitive, ditransitive and resultative constructions. The argument structure constructions provide the basic sentence patterns of a language, directly reflect types of basic frames of experience, such as something causing something else to move, someone causing someone to receive something, something moving somewhere, someone causing something to change state, etc.

more on NTL: http://www.icsi.berkeley.edu/NTL/publications.html

[Writeup on CG-talk by Y. Zhao in J. Bach's MindBuilding Seminar WS03/04 Osnabrück ]

Introduction to LSI

Here we can make an analogy to help us understand the principles of VSM and LSI. Suppose you want to know what on earth Cognitive Science is and more concretely what contributing subjects to cognitive science are. You send questionnaires with 10 subjects on them to people and ask people to rank them from 10 (most important to Cognitive Science) to 1 (least important to Cognitive Science). You note down the rankings of these 3 subjects --- linguistics, psychology and neuroscience. You can graph the results of your survey by setting up a chart with three orthogonal axes, each for one each keyword. To plot a particular “Cognitive Science”, you count the rank of the 3 keywords in each questionnaire, and then take the appropriate number of steps (1-10) along the axis for that questionnaire. Each questionnaire forms a vector in that space, with its direction and magnitude determined by how high the ranks of the three keywords appear in it. When you are finished, you get a cluster of points in this three-dimensional space.

If you draw a line from the origin of the graph to each of these points, you obtain a set of vectors in the “Cognitive Science” space. The size and direction of each vector tells you how high the ranks of the three key items were in any particular questionnaire, and the set of all the vectors taken together tells you something about the kind of Cognitive Science people favour.

We choose 3 key words, so we get a 3-D visualization. However, this space could have any number of dimensions, depending on how many keywords we chose to use. If we were to choose Artificial Intelligence, Anthropology as well, then we have a 5 dimensional term space. In real world this term space usually has many thousands of dimensions. Each document in our collection is a vector with as many components as there are content words. Although we can not visualize them, the basic idea is the same as what we have with the Cognitive Science example. If we use binary term weight, that is whether the term occur in the document or not, intuitively documents in such a space that have many words in common will have vectors that are near to each other, while documents with few shared words will have vectors that are far apart.

[6. part of term paper submitted 2004.04 Corpus-based Semantics]

Search engine vs. classic IR

attractive and useful as they are, search engines are confused with free-text retrieval systems. In fact retrieval from web pages is something that is quite different from classic IR systems. Basically they are different in the following aspects:

Firstly they differ in data volume. Classic IR system has a capacity of several GB, while search engines must deal with billions of web pages. Thus search engines usually apply several servers for webpage indexing, which is not necessary for normal enterprises.

Secondly they differ in ranking. Search engines such as Google have its special ranking algorithm, not only according to relevance of content but also according to the importance of the web pages. A real enterprise application will be only concerned with relevance of content. That is, according the query the most relevant information should be place in the first place. Thus link analysis is useless.

Thirdly, enterprise IR systems are usually real time applications. Once the data has been changed, retrievals should reflect those changes. On the other hand, search engines have their indexing module and retrieval module separated. Indexes are updated periodically. Large search engines like Goggle need 28 days.

Fourthly, since both data and user groups are huge and complex, search engines have great difficulties to apply computation-intensive techniques from data-mining and classic IR systems. At present, most search engines simply match keywords. On the other hand, it would be much easier to apply intelligent and individualized techniques for user-specific and data-specific enterprise IR systems.

[2. part of term paper submitted 2004.04 Corpus-based Semantics]