Language is about the association between two very different domains, that of meaning (including everything anybody might ever want to talk about) and that of linguistic form (phonetic/orthographic units, words, sentences, discourses). Going from one to the other is what language users must somehow accomplish. Because the mapping is very complex, this is hard.
Meaning includes not only events, states, and objects (all possibly hypothetical) and the relations (temporal, causal, etc.) among the events and states, but also the goal of the language producer in referring to them.
Learning labels and grammatical patterns that stand for concepts seems to make us "symbolic" (in some sense) and possibly to affect the way we parse the world.
Units (other than the smallest units) consist of constituents. Constituency matters because it is meaningful; meanings are apparently created and extracted on the basis of something like compositionality.
Dealing with linguistic units is complicated by the fact that language happens in time. Spoken linguistic forms, once produced, are lost. Both generators and analyzers have to store some representation of these forms in short-term memory.
The boundaries between constituents are often not obvious. For example, in spoken language words are usually not separated by gaps. Language analyzers need to segment the input.
Language generation works by converting concepts into linguistic units. This involves a sort of semantic "segmentation" and a mapping of these segments onto linguistic units.
Identifying the structure of a chunk of language involves both segmentation, finding boundaries between constituents, and aggregation, combining elements into larger units. The clues to how this is done may not be obvious.
Generating language presumably involves aggregation of smaller semantic units into larger ones.
Constituents may be embedded within other constituents, and an embedded constituent may be of the same type as the constituent it is embedded in, leading to the possibility of recursive structure. This is challenging for some types of models or physical devices (nervous systems?). And this leads to the question of whether language really is recursive.
Recognizing and generating discourse involves analogical mappings between structures.
Language is structured in different ways at at least two different levels, phonological and morphosyntactic. Understanding how phonology works may not be of much help in understanding how morphosyntax works, and understanding how morphosyntax (and semantics) works may not be of much help in understanding how discourse works.
Linguistic units belong to categories. Some of these, such as words, are directly involved in language analysis and generation. Others, such as syllables, are in the service of analysis or generation. Learning and recognizing categories means solving the invariance problem, discovering what matters and what does not for each category.
Phonological categories, especially phonemes, are notoriously variable, depending on the phonetic context, the speaker's age and gender, and global properties of the utterance. The invariance problem is the problem of establishing what it is that makes, say, a /p/ a /p/ and what it is that's irrelevant and must be factored out in identifying consonants.
Morphemes are the meaningful units of language, but the form of a morpheme may vary considerably, depending on the morphemes around it. In addition to identifying morphemes, a language analyzer may also need to identify the syntactic category that a word belongs to. This may be difficult because of ambiguity (the next point).
In general, the language generator's meaning is underspecified. Analyzers have to work to fill in the gaps. This requires making use of the linguistic and non-linguistic context and using both knowledge of language and general world knowledge.
Units of language are often ambiguous; they have multiple interpretations. Words may have multiple meanings, pronouns may refer to more than one candidate thing, sentences may be assigned more than one structure, and the relations among groups of sentences may not be explicitly signaled.
Languages are also often redundant in very specific and constrained ways. This may make language analysis easier, but language generators must adhere to these constraints in order to produce grammatical utterances.
Language generators can generate and language analyzers can analyze sentences and discourses that they've never heard before by recombining units in novel ways, using familiar patterns of combination. That is, language is productive. Similarly, the ability to analyze or meaningfully generate a particular sentence apparently implies the ability to analyze or meaningfully generate a grammatical rearrangement of the sentence. That is, language is systematic. Whatever form it takes, knowledge of language has to permit generalization. Whether this involves only interpolation between known examples or extrapolation beyond them (more challenging) is not so clear.
Languages obviously differ from one another in various ways, and dialects within languages differ from one another (in fact there is no rigorous way to distinguish languages from dialects).
A language can be seen as a way of slicing up a continuous reality into a finite set of lexical and grammatical categories. But each language does this in its own way.
The relationship between words or structures in different languages is often one-to-many, many-to-one, or more complex than that.
Obviously languages are learned, though how much is learned is a subject of lots of disagreement. The problem is that the input to the learner seems to underspecify what needs to get learned; that is, the range of possible "hypotheses" that are compatible with the input is too large. If the input to machines learning natural language is "natural", they will face similar problems. Apparently some sort of constraints are needed on what can be learned. But what sort?
Children often learn the meanings of words on the basis of very few presentations. Without constraints on what is a possible meaning, this seems impossible.
Multiple grammars seem to be compatible with the input children receive. Without constraints on what is a possible grammar, it seems impossible to learn grammar.
Understanding and producing language require the simultaneous use of multiple levels of linguistic and non-linguistic knowledge and reasoning, including reasoning about the beliefs of the speaker/hearer (theory of mind).