Showing posts with label child. Show all posts
Showing posts with label child. Show all posts

Tuesday, 4 October 2022

A desire for clickbait can hinder an academic journal's reputation

 


On 28th September, I woke up to look at Twitter and find Pete Etchells fulminating about a piece in the Guardian.  

It was particularly galling for him to read a piece that implied research studies had shown that voice-responsive devices were harming children’s development when he and Amy Orben had provided comments to the Science Media Centre that were available to the journalist. They both noted that: 

a) This was a Viewpoint piece, not new research 

b) Most of the evidence it provided consisted of anecdotes from newspaper articles

I agreed with Pete’s criticism of the Guardian, but having read the original Viewpoint in the Archives of Disease in Childhood, I had another question, namely, why on earth was a reputable paediatrics journal doing a press release on a flimsy opinion piece written by two junior medics with no track record in the area? 

So I wrote to the Editor with my concerns, as follows: 

Dear Dr Brown 

Viewpoint: Effects of smart voice control devices on children: current challenges and future perspectives doi 10.1136/archdischild-2022-323888 Journal: Archives of Disease in Childhood  

I am writing to enquire why this Viewpoint was sent out to the media under embargo as if it was a substantial piece of new research. I can understand that you might want to publish less formal opinion pieces from time to time, but what I cannot understand is the way this was done to attract maximum publicity by the media. 

The two people who commented about it for the Science Media Centre both noted this was an opinion piece with no new evidence, relying mainly on media reports. 

https://www.sciencemediacentre.org/expert-reaction-to-an-opinion-piece-on-voice-controlled-devices-and-child-development/ 

Unfortunately, despite this warning, it has been picked up by the mainstream media, where it is presented as ‘new research’, which will no doubt give parents of young children something new to worry about. 

I checked out the authors, and found these details: 

https://orcid.org/0000-0003-4881-8293 

https://www.researchgate.net/profile/Ananya-Arora-3 

These confirm that neither has a strong research track record, or any evidence of expertise in the topic of the Viewpoint. I can only assume that ADC is desperate for publicity at any cost, regardless of scientific evidence or impact on the public. 

As an Honorary Fellow of the Royal College of Paediatrics and Child Health, and someone who has previously published in ADC, I am very disappointed to see the journal sink so low. 

Yesterday I got a reply that did nothing to address my concerns. Here’s what the editor, Nick Brown*, said (in italic), with my reactions added: 

Thank you for making contact . My response reflects the thoughts of both the BMJ media and publication departments  

Given my reactions, below, this is more worrying than reassuring. It would be preferable to have heard that there had been some debate as to the wisdom of promoting this article to the press. 

It is a key role of a scientific journal to raise awareness of, and stimulate debate on, live and emerging issues. Voice control devices are becoming increasingly common and their impact on children's development is a legitimate topic of discussion.  

I have no quarrel with the idea that impact of voice control devices on children is a legitimate topic for the journal. But I wonder about how far its role is ‘raising awareness of, and stimulating debate’ when the topic is one on which we have very little evidence. A scientific journal might be expected to provide a balanced account of evidence, whereas the Viewpoint presented one side of the ‘debate’, mainly using anecdotes. I doubt it would have been published if it had concluded that there was no negative impact of voice control devices.  

Opinion pieces are part of a very wide range of content that is selected for press release from among BMJ's portfolio of journals. They are subject to internal review in line with BMJ journals´overall editorial policy: the process (intentionally) doesn't discriminate against authors who don't have a strong research track record in a particular field  

I’ve been checking up on how frequently ADC promotes an article for press release. This information can be obtained here. This year, they have published 219 papers, of which three other articles have merited a press release: an analysis of survey data on weight loss (July), a research definition of Long Covid in children (February) and a data-based analysis of promotional claims about baby food (February). Many of the papers that were not press-released are highly topical and of general interest – a quick scan found papers on vaping, monkey pox, transgender adolescents, unaccompanied minors as asylum seekers, as well as many papers relating to Covid. It’s frankly baffling why a weakly evidenced viewpoint on a topic with little evidence was selected as meriting special treatment with a press release. 

As for the press release pathway itself, all potential pieces are sent out under embargo, irrespective of article type. This maximises the chances of balanced coverage: an embargo period enables journalists to contact the authors with any queries and to contact other relevant parties for comment. 

My wording may have been clumsy here and led to misunderstanding. My concern was more with the fact that the paper was press-released, which is, as established above, highly unusual, rather than with the embargo.  

The press release clearly stated (3 times) this article was a viewpoint and not new research, and that it hadn't been externally peer reviewed. We also always include a direct URL link to the article in question in our press releases so that journalists can read the content in full for themselves. 

I agree that the press release included these details, and indeed, had journalists consulted the Science Media Centre’s commentaries, the lack of peer review and data would have been evident. But nevertheless, it’s well-known that (a) journalists seldom read original sources, and (b) some of the less reputable newspapers are looking for  clickbait, so why provide them with the opportunity for sensationalising journal content?

While we do all we can to ensure that journalists cover our content responsibly, we aren't responsible for the manner in which they choose to do so. 

I agree that part of the blame for the media coverage lies with journalists. But I think the journal must bear some responsibility for the media uptake of the article. It’s a reasonable assumption that if a reputable journal issues a press release, it’s because the article in question is important and provides novel information from recognised experts in the field. It is unfortunate that that assumption was not justified in that case. 

I just checked to see how far the media interest in the story had developed. The Guardian, confronted with criticism, changed the lede to say “Researchers suggest”, rather than “New research says”, but the genie was well out of the bottle by that time. The paper has an Altmetric ‘attention’ score of 1577, and been picked up by 209 news outlets. There’s no indication that the article has “stimulated debate”. Rather it has been interpreted as providing a warning about a new danger facing children. The headlines, which can be found here, are variants of: 

 “Alexa and Siri make children rude” 

“Siri, Alexa and Google Home could hinder children’s social and cognitive development” 

“Voice-control devices may have an impact on children’s social, emotional development: Study” 

“According to a study, voice-controlled electronic aides can impair children’s development” 

“Experts warn that AI assistants affect children’s social development” 

“Experts warn AI assistants affect social growth of children” 

“Why Alexa and Siri may damage kids’ social and emotional development” 

“Voice assistants harmful for your child’s development, claims study” 

“Alexa, Siri, and Other Voice Assistants could negatively rewire your child’s brain” 

“Experts warn using Alexa and Siri may be bad for children” 

“Parents issued stark warning over kids using Amazon’s Alexa” 

“Are Alexa and Siri making our children DUMB?” 

“Use of voice-controlled devices ‘might have long-term consequences for children’” 

And most alarmingly, from the Sun: 

“Urgent Amazon Alexa warning for ALL parents as new danger revealed” 

Maybe the journal’s press office regards that as a success. I think it’s a disaster for the journal’s reputation as a serious academic journal. 

 

*Not the sleuth Nick Brown. Another one. 

Saturday, 9 June 2018

Developmental language disorder: the need for a clinically relevant definition

There's been debate over the new terminology for Developmental Language Disorder (DLD) at a meeting (SRCLD) in the USA. I've not got any of the nuance here, but I feel I should make a quick comment on one issue I was specifically asked about, viz:


As background: the field of children's language disorders has been a terminological minefield. The term Specific Language Impairment (SLI) began to be used widely in the 1980s as a diagnosis for children who had problems acquiring language for no apparent reason. One criterion for the diagnosis was that the child's language problems should be out of line with other aspects of development, and hence 'specific', and this was interpreted as requiring normal range nonverbal IQ (nviq).

The term SLI was never adopted by the two main diagnostic systems -WHO's International Classification of Diseases (ICD) or the American Psychiatric Association's Diagnostic and Statistical Manual (DSM), but the notion that IQ should play a part in the diagnosis became prevalent.

In 2016-7 I headed up the CATALISE project with the specific goal of achieving some consensus about the diagnostic criteria and terminology for children's language disorders: the published papers about this are openly available for all to read (see below). The consensus of a group of experts from a range of professions and countries was to reject SLI in favour of the term DLD.

Any child who meets criteria for SLI will meet criteria for DLD: the main difference is that the use of an IQ cutoff is no longer part of the definition. This does not mean that all children with language difficulties are regarded as having DLD: those who meet criteria for intellectual disability, known syndromes or biomedical conditions are treated separately (see these slides for summary).

The tweet seems to suggest we should retain the term SLI, with its IQ cutoff, because it allows us to do neatly controlled research studies. I realise a brief, second-hand tweet about Rice's views may not be a fair portrayal of what she said, but it does emphasise a bone of contention that was thoroughly gnawed in the discussions of the CATALISE panel, namely, what is the purpose of diagnostic terminology? I would argue its primary purpose is clinical, and clinical considerations are not well-served by research criteria.

The traditional approach to selecting groups for research is to find 'pure' cases - quite simply, if you include children who have other problems beyond language (including other neurodevelopmental difficulties) then it is much harder to know how far you are assessing correlates or causes of language problems: things get messy and associations get hard to interpret. The importance of controlling for nonverbal IQ has been particularly emphasised over many years: quite simply, if you compare language-impaired vs comparison (typically-developing, or td) children on a language or cognitive measure, and the language-impaired group has lower nonverbal ability, then it may be that you are looking at a correlate of nonverbal ability rather than language. Restricting consideration to those who meet stringent IQ criteria to equalise the groups is one way of addressing the issue.

However, there are three big problems with this approach:

1. A child's nonverbal IQ can vary from time to time and it will depend on the test that is used. However, although this is problematic, it's not the main reason for dropping IQ cutoffs; the strongest arguments concern validity rather than reliability of an IQ-based approach.

2. The use of IQ-cutoffs ignores the fact that pure cases of language impairment are the exception rather than the rule. In CATALISE we looked at the evidence and concluded that if we were going to insist that you could only get a diagnosis of DLD if you had no developmental problems beyond language, then we'd exclude many children with language problems (see also this old blogpost). If our main purpose is to get a diagnostic system that is clinically workable, it should be applicable to the children who turn up in our clinics - not just a rarefied few who meet research criteria. An analogy can be drawn with medicine: imagine if your doctor identified you with high blood pressure but refused to treat you unless you were in every other regard fit and healthy. That would seem both unfair and ill-judged. Presence of co-occurring conditions might be important for tracking down underlying causes and determining a treatment path, but it's not a reason for excluding someone from receiving services.

3. Even for research purposes, it is not clear that a focus on highly specific disorders makes sense. An underlying assumption, which I remember starting out with, was the idea that the specific cases were in some important sense different from those who had additional problems. Yet, as noted in the CATALISE papers, the evidence for this assumption is missing: nonverbal IQ has very little bearing on a child's clinical profile, response to intervention, or aetiology. For me, what really knocked my belief in the reality of SLI as a category was doing twin studies: typically, I'd find that identical twins were very similar in their language abilities, but they sometimes differed in nonverbal ability, to the extent that one met criteria for SLI and the other did not. Researchers who treat SLI as a distinct category are at risk of doing research that has no application to the real world.

There is nothing to stop researchers focusing on 'pure' cases of language disorder to answer research questions of theoretical interest, such as questions about the modularity of language. This kind of research uses children with a language disorder as a kind of 'natural experiment' that may inform our understanding of broader issues. It is, however, important not to confuse such research with work whose goal is to discover clinically relevant information.

If practitioners let the theoretical interests of researchers dictate their diagnostic criteria, then they are doing a huge disservice to the many children who end up in a no-man's-land, without either diagnosis or access to intervention. 

References

Bishop, D. V. M. (2017). Why is it so hard to reach agreement on terminology? The case of developmental language disorder (DLD). International Journal of Language & Communication Disorders, 52(6), 671-680. doi:10.1111/1460-6984.12335

Bishop, D. V. M., Snowling, M. J., Thompson, P. A., Greenhalgh, T., & CATALISE Consortium. (2016). CATALISE: a multinational and multidisciplinary Delphi consensus study. Identifying language impairments in children. PLOS One, 11(7), e0158753. doi:10.1371/journal.pone.0158753

Bishop, D. V. M., Snowling, M. J., Thompson, P. A., Greenhalgh, T., & CATALISE Consortium. (2017). Phase 2 of CATALISE: a multinational and multidisciplinary Delphi consensus study of problems with language development: Terminology. Journal of Child Psychology and Psychiatry, 58(10), 1068-1080. doi:10.1111/jcpp.12721

Wednesday, 21 November 2012

Moderate drinking in pregnancy: toxic or benign?

There’s no doubt that getting tipsy while pregnant is a seriously bad idea. Alcohol is a toxin that can pass through the placenta to the foetus and cause damage to the developing brain.  For women who are regular heavy drinkers or binge drinkers, there is a risk that the child will develop foetal alcohol syndrome, a condition that affects physical development and is associated with learning difficulties.
But what of more moderate drinking? The advice is conflicting. Many doctors take the view that alcohol is never going to be good for the developing foetus and they recommend complete abstention during pregnancy as a precautionary measure. Others have argued, though, that this advice is too extreme, and that moderate drinking does not pose any risk to the child.

Last week a paper by Lewis et al was published in PLOS One providing evidence on this issue, and concluding that moderate drinking does pose a risk and should be avoided. The methodology of the paper was complex and it’s worth explaining in detail what was done.

The researchers used data from ALSPAC, a large study that followed the progress of several thousand British children from before birth. A great strength of this study is that information was gathered prospectively: in the case of maternal drinking, mothers completed questionnaires during pregnancy, at 18 and 32 weeks gestation.  Obviously, the data won’t be perfect: you have to rely on women to report their intake honestly, but it’s hard to see how else to gather such data without being overly intrusive. When children were 8 years old, they were given a standard IQ test, and this was the dependent variable in the study.

One obvious thing to do with the data would be to see if there is any relationship between amount drank in pregnancy and the child’s IQ. Quite a few studies have done this and a recent systematic review concluded that, provided one excluded women who drank more than 12 g (1.5 UK units) per day or who were binge-drinkers, there was no impact on the child. Lewis et al pointed out, however, that this is not watertight, because drinking in pregnancy is associated with other confounding factors. Indeed, in their study, the lowest IQs were obtained by children of mothers who did not drink at all during pregnancy. However, these mothers were also likely to be younger and less socially-advantaged than mothers who drank, making it hard to disentangle causal influences.

So this is where the clever bit of the study design came in, in the shape of mendelian randomisation. The logic goes like this: there are genetic differences between people in how they metabolise alcohol. Some people can become extremely drunk, or indeed ill, after a single drink, whereas others can drink everyone else under the table. This relates to variation in a set of genes known as ADH genes, which are clustered together on chromosome 4. If a woman metabolises alcohol slowly, this could be particularly damaging to the foetus, because alcohol hangs around in the bloodstream longer. There are quite large racial differences in ADH genes, and for that reason the researchers restricted consideration just to those of White European background. For this group, they showed that variation in ADH genes is not related to social background. So they had a very specific prediction: for women who drank in pregnancy, there should be a relationship between their ADH genes and the child’s outcome. However, if the woman did not drink at all, then the ADH genotype should make no difference. This is the result they reported. It’s important to be clear that they did not directly estimate the impact of maternal drinking on the child’s IQ: rather, they inferred that if ADH genotype is associated with child’s IQ only in drinkers, then this is indirect evidence that drinking is having an impact. This is a neat way of showing that there is an effect of a risk factor (alcohol consumption) avoiding the complications of confounding by social class differences.

Several bloggers, however, were critical of the study. Skeptical Scalpel noted that the effect on IQ was relatively small and not of clinical significance. However, in common with some media reports, he seems to have misunderstood the study and assumed that the figure of 1.8 IQ points was an estimate of the difference between drinkers and abstainers – rather than the effect of ADH risk alleles in drinkers (see below). David Spiegelhalter pointed out that there was no direct estimate of the size of the effect of maternal alcohol intake. Indeed, when drinkers and non-drinkers were directly compared, IQs were actually slightly lower in non-drinkers. Carl Heneghan also commented on the small IQ effect size, but was particularly concerned about the statistical analysis, arguing that it did not adjust adequately for the large number of genetic variants that were considered.

Should we dismiss effects because they are small? I’m not entirely convinced by that argument. Yes, it’s true that IQ is not a precise measure: if an individual child has an IQ of 100, there is error of measurement around that estimate so that the 95% confidence interval is around 95-105 (wider still if a short form IQ is used, as was the case here). This measurement error is larger than the per-allele effects reported by Lewis et al., but they were reporting means from very large numbers of children. If there are reliable differences between these means, then this would indicate a genuine impact on cognition, potentially as large as 3.5 IQ points (for those with four rather than two risk alleles). Sure, we should not alarm people by implying that moderate drinking causes clinically significant learning difficulties, but I don’t think we should just dismiss such a result. Overall cognitive ability is influenced by a host of risk factors, most of which are small, but whose effects add together. For a child who already has other risks present, even a small downwards nudge to IQ could make a difference.

But what about Heneghan’s concern about the reliability of the results? This is something that also worried me when I scrutinised Table 1, which shows for each genetic locus the ‘per allele’ effect on IQ. I’ve plotted the data for child genotypes in Figure 1. Only one SNP (#10) seems to have a significant effect on child IQ. Yet when all loci were entered into a stepwise multiple regression analysis, no fewer than four child loci were identified as having a significant effect. The authors suggested that this could reflect interactions between genes that are on the same genetic pathway.
Effect of child SNP variants (per allele) on IQ (in IQ points), with 95% CI, from Lewis et al Table 1,

I had been warned about stepwise regression by those who taught me statistics many years ago. Wikipedia has a section on Criticisms, noting that results can be biased when many variables are included as predictors. But I found it hard to tell just how serious a problem this was. When in doubt, I find it helpful to simulate data, and so that is what I did in this case, using a function in R that generates multivariate normal data. So I made a dataset where there was no relationship between any of 11 variables – ten of which were designated as genetic loci, and one as IQ. I then ran backwards stepwise regression on the dataset. I repeated this exercise many times, and was surprised at just how often spurious associations of IQ with ‘genotypes’ was seen (as described here). I was concerned that this dataset was not a realistic simulation, because the genotype data from Lewis et al consisted of counts of how many uncommon alleles there were at a given locus (0, 1 or 2 – corresponding to aa, aA or AA, if you remember Mendel’s peas). So I also simulated that situation from the same dataset, but actually it made no difference to the findings. Nor did it make any difference if I allowed for correlations between the ‘genotypes’. Overall, I came away alarmed at just how often you can get spurious results from backwards stepwise regression – at least if you use the AIC criterion that is the default in the R package.

Lewis et al did one further analysis, generating an overall risk score based on the number of risk alleles (i.e. the version of the gene associated with lower IQ) for the four loci that were selected by the stepwise regression. This gave a significant association with child IQ, just in those who drunk in pregnancy: mean IQ was 104.0 (SD 15.8) for those with 4+ risk alleles, 105.4 (SD = 16.1) for those with 3 risk alleles and 107.5 (SD = 16.3) for those with 2 or less risk alleles. However, I was able to show very similar results from my analysis of random data: the problem here is that in a very large sample with many variables some associations will emerge as significant just by chance, and if you then select just those variables and add them up, you are capitalising on the chance effect.

One other thing intrigued me. The authors made a binary divide between those who reported drinking in pregnancy and those who did not. The category of drinker spanned quite a wide range from those who reported drinking less than 1 unit per week (either in the first 3 months or at 32 weeks of pregnancy) up to those who reported drinking up to 6 units per week. (Those drinking more than this were excluded, because the interest was in moderate drinkers). Now I’d have thought there would be interest in looking more quantitatively at the impact of moderate drinking, to see if there was a dose-response effect, with a larger effect of genotype on those who drank more. The authors mentioned a relevant analysis where the effect of genotype score on child IQ was greater after adjustment for amount drank at 32 weeks of pregnancy, but it is not clear whether this was a significant increase, or whether the same was seen for amount drank at 18 weeks. In particular, one cannot tell whether there is a safe amount to drink from the data reported in this paper. In a reply to my comment on the PLOS One paper, the first author states: “We have since re-run our analysis among the small group of women who reported drinking less than 1 unit throughout pregnancy and we found a similar effect to that which we reported in the paper.” But that suggests there is no dose-response effect for alcohol: I’m not an expert on alcohol effects, but I do find it surprising that less than one drink per week should have an effect on the foetal brain – though as the author points out, it’s possible that women under-reported their intake.

I’m also not a statistical expert and I hesitate to recommend an alternative approach to the analysis, though I am aware that there are multiple regression methods designed to avoid the pitfalls of stepwise regression. It will be interesting to see whether, as predicted by the authors, the genetic variants associated with lower IQ are those that predispose to slow alcohol metabolism. At the end of the day, the results will stand or fall according to whether they replicate in an independent sample.


Reference
Lewis SJ, Zuccolo L, Davey Smith G, Macleod J, Rodriguez S, Draper ES, Barrow M, Alati R, Sayal K, Ring S, Golding J, & Gray R (2012). Fetal Alcohol Exposure and IQ at Age 8: Evidence from a Population-Based Birth-Cohort Study. PloS one, 7 (11) PMID: 23166662

Monday, 3 September 2012

What Chomsky doesn't get about child language

 

© Cartoonstock.com
Noam Chomsky is widely regarded as an intellectual giant, responsible for a revolution in how people think about language.  In a recent book by Chomsky and James McGilvray, the Science of Language, the foreword  states: “It is particularly important to understand Chomsky’s views … not only because he virtually created the modern science of language by himself …. but because of what he and colleagues have discovered about language – particularly in recent years…”  

As someone who works on child language disorders, I have tried many times to read Chomsky in order to appreciate the insights that he is so often credited with. I regret to say that, over the years, I have come to the conclusion that, far from enhancing our understanding of language acquisition, his ideas have led to stagnation, as linguists have gone through increasingly uncomfortable contortions to relate facts about children’s language to his theories. The problem is that the theories are derived from a consideration of adult language, and take no account of the process of development. There is a fundamental problem with an essential premise about what is learned that has led to years of confusion and sterile theorising.
Let us start with Chomsky’s famous sentence "Colourless green ideas sleep furiously". This was used  to demonstrate independence of syntax and semantics: we can judge that this sentence is syntactically well-formed even though it makes no sense. From this, it was a small step to conclude  that language acquisition involves deriving abstract syntactic rules that determine well-formedness, without any reliance on meaning. The mistake here was to assume that an educated adult's ability to judge syntactic well-formedness in isolation has anything to do with how that ability was acquired in childhood. Already in the 1980s, those who actually studied language development found that children used a wide variety of cues, including syntactic, semantic, and prosodic information, to learn language structure (Bates & MacWhinney, 1989).  Indeed, Dabrowska (2010) subsequently showed that agreement on well-formedness of complex sentences was far from universal in adults.
Because he assumed that children were learning abstract syntactic rules from the outset, Chomsky encountered a serious problem. Language, defined this way, was not learnable by any usual learning system: this could be shown by formal proof from mathematical learning theory. The logical problem is that such learning is too unconstrained: any grammatical string of elements is compatible with a wide range of underlying rule systems. The learning becomes a bit easier if children are given negative evidence (i.e., the learner is explicitly told which rules are not correct), but (a) this doesn’t really happen and (b) even if it did, arrival at the correct solution is not feasible without some prior knowledge of the kinds of rules that are allowable. In an oft-quoted sentence, Chomsky (1965) wrote: "A consideration of the character of the grammar that is acquired, the degenerate quality and narrowly limited extent of the available data, the striking uniformity of the resulting grammars, and their independence of intelligence, motivation and emotion state, over wide ranges of variation, leave little hope that much of the structure of the language can be learned by an organism initially uninformed as to its general character." (p. 58) (my italics).
So we were led to the inevitable, if surprising, conclusion that if grammatical structure cannot be learned, it must be innate. But different languages have different grammars. So whatever is innate has to be highly abstract – a Universal Grammar.  And the problem is then to explain how children get from this abstract knowledge to the specific language they are learning. The field became encumbered by creative but highly implausible theories, most notably the parameter-setting account, which conceptualised language acquisition as a process of "setting a switch" for a number of innately-determined parameters (Hyams, 1986). Evidence, though, that children’s grammars actually changed in discrete steps, as each parameter became set, was lacking. Reality was much messier.
Viewed from a contemporary perspective, Chomsky’s concerns about the unlearnability of language seem at best rather dated and at worst misguided. There are two key features in current developmental psycholinguistics that were lacking from Chomsky’s account, both concerning the question of what is learned. First, there is the question of the units of acquisition: for Chomsky, grammar is based on abstract linguistic units such as nouns and verbs, and it was assumed that children operated with these categories. Over the past 15 years, direct evidence has emerged to indicate that children don't start out with awareness of underlying grammatical structure; early learning is word-based, and patterning in the input at the level of abstract elements is something children become aware of as their knowledge increases (Tomasello, 2000).  
Second, Chomsky viewed grammar as a rule-based system that determined allowable sequences of elements. But people’s linguistic knowledge is probabilistic, not deterministic. And there is now a large body of research showing how such probabilistic knowledge can be learned from sequential inputs, by a process of statistical learning. To take a very simple example, if repeatedly presented with a sequence such as ABCABADDCABDAB, a learner will start to be aware of dependencies in the input, i.e. B usually follows A, even if there are some counter-examples. Other types of sequence such as AcB can be learned, where c is an element that can vary (see Hsu & Bishop, 2010, for a brief account). Regularly encountered sequences will then form higher-level units. At the time Chomsky was first writing, learning theories were more concerned with forming of simple associations, either between paired stimuli, or between instrumental acts and outcomes. These theories were not able to account for learning of the complex structure of natural language. However, once language researchers started to think in terms of statistical learning, this led to a reconceptualisation of what was learned, and many of the conceptual challenges noted by Chomsky simply fell away.
Current statistical learning accounts allow us to move ahead and to study the process of language learning. Instead of assuming that children start with knowledge of linguistic categories, categories are abstracted from statistical regularities in the input (see Special Issue 03, Journal of Child Language 2010, vol 37). The units of analysis thus change as the child develops expertise. And, consistent with the earlier writings of Bates and MacWhinney (1989), children's language is facilitated by the presence of correlated cues in the input, e.g., prosodic and phonological cues in combination with semantic context. In sharp contrast to the idea that syntax is learned by a separate modular system divorced from other information, recent research emphasises that the young language learner uses different sources of information together. Modularity emerges as development proceeds.
A statistical learning account does not, however, entail treating the child as a “blank slate”. Developmental psychology has for many years focused on constraints on learning: biases that lead the child to attend to particular features of the environment, or to process these in a particular way. Such constraints will affect how language input is processed, but they are a long way from the notion of a Universal Grammar. And such constraints are not specific to language: they influence, for instance, our ability to perceive human faces, or to group objects perceptually.

It would be rash to assume that all the problems of language acquisition can be solved by adopting a statistical learning approach. And there are still big questions, identified by Chomsky and others – Why don’t other species have syntax? How did language evolve? Is linguistic ability distinct from general intelligence?  But we now have a theoretical perspective that makes sense in terms of what we know about cognitive development and neuropsychology, that has general applicability to many different aspects of language acquisition, which forges links between language acquisition and other types of learning, and leads to testable predictions. The beauty of this approach is that it is amenable both to experimental test and to simulations of learning, so we can identify the kinds of cues children rely on, and the categories that they learn to operate with.

So how does Chomsky respond to this body of work? To find out, I decided to take a look at The Science of Language, which based on transcripts of conversations between Chomsky and James McGilvray between 2004 and 2009. It was encouraging to see from the preface that the book is intended for a general audience and “Professor Chomsky’s contributions to the interview can be understood by all”.  

Well, as “one of the most influential thinkers of our time”, Chomsky fell far short of expectation. Statistical learning and connectionism were not given serious consideration, but were rapidly dismissed as versions of behaviourism that can’t possibly explain language acquisition. As noted by Pullum elsewhere, Chomsky derides Bayesian learning approaches as useless – and at one point claimed that statistical analysis of sequences of elements to find morpheme boundaries “just can’t work” (cf. Romberg & Saffran, 2010). He seemed stuck with his critique of Skinnerian learning and ignorant of how things had changed.
I became interested in not just what Chomsky said, but how he said it.  I’m afraid that despite the reassurances in the foreword, I had enormous difficulty getting through this book. When I read a difficult text, I usually take notes to summarise the main points. When I tried that with the Science of Language, I got nowhere because there seemed no coherent structure. Occasionally an interesting gobbet of information bobbed up from the sea of verbiage, but it did not seem part of a consecutive argument. The style is so discursive that it’s impossible to précis. His rhetorical approach seemed the antithesis of a scientific argument. He made sweeping statements and relied heavily on anecdote.

A stylistic device commonly used by Chomsky is to set up a dichotomy between his position and an alternative, then represent the alternative in a way that makes it preposterous. For instance, his rationalist perspective on language acquisition, which presupposes innate grammar, is contrasted with an empiricist position in which “Language tends to be seen as a human invention, an institution to which the young are inducted by subjecting them to training procedures”.  Since we all know that children learn language without explicit instruction, this parody of the empiricist position has to be wrong.
Overall, this book was a disappointment: one came away with a sense that a lot of clever stuff had been talked about, and much had been confidently asserted, but there was no engagement with any opposing point of view – just disparagement.  And as Geoffrey Pullum concluded, in a review in the Times Higher Education, there was, alas, no science to be seen.


References
Bates, E., & MacWhinney, B. (1989). Functionalism and the competition model. In B. MacWhinney & E. Bates (Eds.), The crosslinguistic study of sentence processing (pp. 3-73). Cambridge: Cambridge University Press. Available from: http://psyling.psy.cmu.edu/papers/bib.html
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, N., & McGilvray, J. (2012). The Science of Language: Interviews with James McGilvray. Cambridge: Cambridge University Press.
Dabrowska, E. (2010). Native v expert intuitions: An empirical study of acceptability judgements. The Linguistic Review, 27, 1-23.
Hsu, H. J., & Bishop, D. V. M. (2010). Grammatical difficulties in children with specific language impairment (SLI): is learning deficient? Human Development, 53, 264-277.
Hyams, N. (1986). Language acquisition and the theory of parameters. Dordrecht: Reidel.
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition Wiley Interdisciplinary Reviews: Cognitive Science, 1 (6), 906-914 DOI: 10.1002/wcs.78
Tomasello, M. (2000). Acquiring syntax is not what you think. In D. V. M. Bishop & L. B. Leonard (Eds.), Speech and Language Impairments in Children: Causes, Characteristics, Intervention and Outcome (pp. 1-15). Hove, UK: Psychology Press.


Correction: 4/9/2010. I had originally cited the wrong reference to Dabrowska (Dabrowska, E. 1997. The LAD goes to school : a cautionary tale for nativists. Linguistics, 35, 735-766). The 1997 paper is concerned with variation in adults' ability to interpret syntactically complex sentences. The 2010 paper cited above focuses on grammaticality judgements.

This article (Figshare version) can be cited as:
 Bishop, Dorothy V M (2014): What Chomksy doesn't get about child language. figshare.
http://dx.doi.org/10.6084/m9.figshare.1030403




A far-too-long response to (some) commentators
12th October 2012
One of the nice things about blogging is that it gives an opportunity to get feedback on one’s point of view. I’d like to thank all those who offered comments on what I’ve written here, particularly those who have suggested readings to support the arguments they make. The sheer diversity of views has been impressive, as is the generally polite and scholarly tone of the arguments. I’ve tried to look seriously at the points people have made and I’ve had a fascinating few weeks reading some of the broader literature recommended by commentators.
I quickly realised that I could easily spend several months responding to comments and reading around this area, so I have had to be selective. I’ll steer clear of commenting on Chomsky’s political arguments, which I see as quite a separate issue. Nor am I prepared to engage with those who suggest Chomsky is above criticism, either because he is so famous, or because he’s been around a long time.  Finally, I won’t say more about the views of those who have expressed agreement, or extensions of my arguments – other than to say thanks: this is a weird subject area where all too often people seem scared to speak out for fear of seeming foolish or ignorant. As Anon (4 Sept) says, it can quickly get vitriolic, which is bad for everyone.  But if we at least boldly say what we think, those with different views can either correct us, or develop better arguments. 
I’ll focus in this reply on the main issues that emerged from the discussion: how far is statistical learning compatible with a Chomskyan account, are there things that a non-Chomskyan account simply can’t deal with, and finally, are there points of agreement that could lead to more positive engagement in future between different disciplines?
How compatible is statistical learning with a Chomskyan account?
A central point made by Anon, (3rd Sept/4th Sept), and Chloe Marshall (11th Sept) is that  probabilistic learning is compatible with Chomsky's views. 
This seems to be an absolutely crucial point. If there really is no mismatch between what Chomsky is saying and those who are advocating accounts of language acquisition in terms of statistical learning, then maybe the disagreement is just about terminology and we should try harder to integrate the different approaches. 
It’s clear we can differentiate between different levels of language processing.  For instance, here are just three examples of how statistical learning may be implicated in language learning:

  • The original work by Saffran et al (1996) focused on demonstrating that infants were sensitive to transitional probabilities in syllable strings. It was suggested that this could be a mechanism that was involved in segmenting words from speech input.
  • Redington et al (1998) proposed that information about lexical categories could be extracted from language input by considering sequential co-occurrences of words.
  • Edelman and Waterfall (2007) reviewed evidence that children attend to specific patterns of specific lexical items in their linguistic input, concluding that they first acquire the syntactic patterns of particular words and structures and later generalize information to entire word classes. They went on to describe heuristic methods for uncovering structure in input, using the example of the ADIOS (Automatic DIstillation Of Structure) algorithm. This uses distributional regularities in raw, unannotated corpus data to identify significant co-occurrences, which are used as the basis for distributional classes. Ultimately, ADIOS discovers recursive rule-like patterns that support generalization.

So what does Chomsky make of all of this? I am grateful to Chloe for pointing me to his 2005 paper “Three factors in language design”, which was particularly helpful in tracing the changes in Chomsky’s views over time.
Here’s what he says on word boundaries: 
“In Logical Structure of Linguistic Theory (LSLT; p. 165), I adopted Zellig Harris’s (1955) proposal, in a different framework, for identifying morphemes in terms of transitional probabilities, though morphemes do not have the required beads-on-a-string property. The basic problem, as noted in LSLT, is to show that such statistical methods of chunking can work with a realistic corpus. That hope turns out to be illusory, as has recently been shown by Thomas Gambell and Charles Yang (2003), who go on to point out that the methods do, however, give reasonable results if applied to material that is preanalyzed in terms of the apparently language-specific principle that each word has a single primary stress. If so, then the early steps of compiling linguistic experience might be accounted for in terms of general principles of data analysis applied to representations preanalyzed in terms of principles specific to the language faculty....”
Gambell and Yang don’t seem to have published in the peer-reviewed literature, but I was able to track down four papers by these authors (Gambell & Yang, 2003; Gambell & Yang, 2004; Gambell & Yang, 2005a; Gambell & Yang, 2005b),which all make essentially the same point. They note that a simple rule that treats a low-probability syllabic transition as a word boundary doesn’t work with a naturalistic corpus where a high proportion of words are monosyllabic. However, adding prosodic information – essentially treating each primary stress as belonging to a new word – achieves a much better level of accuracy. 
The work by Gambell and Yang is exactly the kind of research I like: attempting to model a psychological process and evaluating results against empirical data. The insights gained from the modelling take us forward. The notion that prosody may provide key information in segmenting words seems entirely plausible. If generative grammarians wish to refer to such a cognitive bias as part of Universal Grammar, that’s fine with me. As noted in my original piece, I agree that there must be some constraints on learning; if UG is confined to this kind of biologically plausible bias, then I am happy with UG. My difficulties arise with more abstract and complex innate knowledge, such as are involved in parameter setting (of which, more below).
But, even at this level of word identification, there are still important differences between my position and the Chomskyan one. First of all, I’m not as ready as Chomsky to dismiss statistical learning on the basis of Gambell and Yang’s work. Their model assumed a sequence of syllables was a word unless it contained a low transitional probability. Its accuracy was so bad that I suspect it gave a lower level of success than a simpler strategy: “Assume each syllable is a word.”  But consider another potential strategy for word segmentation in English, which would be “Assume each syllable is a complete word unless there’s a very high transitional probability with the next syllable.” I’d like to see a model like that tested before assuming transitional probability is a useless cue.
Second, Gambell and Yang stay within what I see as a Chomskyan style of thinking which restricts the range of information available to the language processor when solving a particular problem. This is parsimonious and makes modelling tractable, but it’s questionable just how realistic it is. It contrasts sharply with the view proposed by Seidenberg and MacDonald (1999), who argue that cues that individually may be poor at solving a categorisation problem, may be much more effective when used together. For instance, the young child doesn’t just hear words such as ‘cat’, ‘dog’, ‘lion’, ‘tiger’, ‘elephant’ or ‘crocodile’: she typically hears them in a meaningful context where relevant toys or pictures are present. Of course, contextual information is not always available and not always reliable. However, it seems odd to assume that this contextual information is ignored when populating the lexicon. This is one of the core difficulties I have with Chomsky: the sense that meaning is not integrated in language learning. 
Turning to lexical categories, the question is whether Chomsky would accept that these might be discovered by the child through a process of statistical learning, rather than being innate. I have understood that he’d rejected this idea, and have not found any statement by him to suggest otherwise, but others may be able to point to these. Franck Ramus (4th Sept) argues that children do represent some syntactic categories well before this is evident in their language and this is not explained by statistical relationships between words. I’m not convinced by the evidence he cites, which is based on different brain responses to grammatical and ungrammatical sentences in toddlers (Bernal et al, 2010). First, the authors state: “Infants could therefore not detect the ungrammaticality by noticing the co-occurrence of two words that normally never occur together”. But they don’t present any information on transitional probabilities in a naturalistic corpus for the word sequences used in their sentences. All that is needed is for statistical learning is for the transitional probabilities to be lower in the ungrammatical than grammatical sentences: they don't have to be zero.  Second, the children in this study were two years old, and would have been exposed to a great deal of language from which syntactic categories could have been abstracted by mechanisms similar to those simulated by Redington et al.
Regarding syntax, I was pleased to be introduced to the work of Jeffrey Lidz, whose clarity of expression is a joy after struggling with Chomsky. He reiterates a great deal of what I regard as the ‘standard’ Chomskyan view, including the following:
Speaking broadly, this research generally finds that children’s representations do not differ in kind from those of adults and that in cases where children behave differently from adults, it is rarely because they have the wrong representations. Instead, differences between children and adults are often attributed to task demands (Crain & Thornton, 1998), computational limitations (Bloom,1990; Grodzinsky & Reinhart, 1993), and the problems of pragmatic integration (Thornton & Wexler, 1999) but only rarely to representational differences between children and adults (Radford, 1995; see also Goodluck, this volume).” Lidz, 2008
The studies cited by Lidz as showing that children’s representations are the same as adults – except for performance limitations – has intrigued me for many years. As someone who has long been interested in children’s ability to understand complex sentence structures, I long ago came to realise that the last thing children usually attend to is syntax: their performance is heavily influenced by context, pragmatics, particular lexical items, and memory load. But my response to this observation is very different from that of the generative linguists. Whereas they strive to devise tasks that are free of these influences, I came to the conclusion that they play a key part in language acquisition.  Again, I find myself in agreement with Seidenberg and MacDonald (1999):
The apparent complexity of language and its uniqueness vis a vis other aspects of cognition, which are taken as major discoveries of the standard approach, may derive in part from the fact that these ‘performance’ factors are not available to enter into explanations of linguistic structure. Partitioning language into competence and performance and then treating the latter as a separate issue for psycholinguists to figure out has the effect of excluding many aspects of language structure and use from the data on which the competence theory is developed.” (p 572)
The main problem I have with Chomskyan theory, as I explained in the original blogpost, is the implausibility of parameter setting as a mechanism of child language acquisition. In The Science of Language, Chomsky (2012) is explicit about  parameter-setting as an attractive way out of the impasse created by the failure to find general UG principles that could account for all languages.  Specifically, he says:
If you’re trying to get Universal Grammar to be articulated and restricted enough so that an evaluation will only have to look at a few examples, given data, because that’s all that’s permitted, then it’s going to be very specific to language, and there aren’t going to be general principles at work. It really wasn’t until the principles and parameters conception came along that you could really see a way in which this could be divorced. If there’s anything that’s right about that, then the format for grammar is completely divorced from acquisition; acquisition will only be a matter of parameter setting. That leaves lots of questions open about what the parameters are; but it means that whatever is left are the properties of language.”
I’m sure readers will point out if I’ve missed anything, but what I take away from this statement is an admission that UG is now seen as consisting of very general and abstract constraints on processing that are not necessarily domain-specific. The principal component of UG that interests Chomsky is 
an operation that enables you to take mental objects [or concepts of some sort], already constructed, and make bigger mental objects out of them.  That’s Merge. As soon as you have that, you have an infinite variety of hierarchically structured expressions [and thoughts] available to you.” 
I have no difficulty in agreeing with the idea that recursion is a key component of language and  humans have a capacity for this kind of processing. But Chomsky makes another claim that I find much harder to swallow. He sees the separation of UG from parameter-setting as a solution to the problem of acquisition; I see it as just moving the problem elsewhere.  For a start, as he himself notes, there are “a lot of questions open” about what the parameters are.  Also, children don’t behave as if parameters are set one way or another: their language output is more probabilistic. I was interested to read that modifications of Chomskyan theory have been proposed to handle this:
Developing suggestions of Thomas Roeper’s, Yang proposes that UG provides the neonate with the full array of possible languages, with all parameters valued, and that incoming experience shifts the probability distribution over languages in accord with a learning function that could be quite general. At every stage, all languages are in principle accessible, but only for a few are probabilities high enough so that they can actually be used.” (Chomsky, 2005, p. 9).
So not only can the theory can be adapted to handle probabilistic data; probability now assumes a key role, as it is the factor that decides which grammar will be adopted at any given point in development.  But while I am pleased to see the probabilistic nature of children’s grammatical structures acknowledged, I still have problems with this account:
First, it is left unclear why a child opts for one version of the grammar at time 1 and another at time 2, then back to the first version at time 3. If we want an account that is explanatory rather than merely descriptive, then non-deterministic behaviour needs explaining. It could reflect the behaviour of a system that is rule-governed but is affected by noise or it could be a case of different options being selected according to other local constraints. What seems less plausible –though not impossible -  is a system that flips from one state to another with a given probability. In a similar vein,  if a grammar has an optional setting on a parameter, just what does that mean?  Is there a random generator somewhere in the system that determines on a moment-by-moment basis what is produced,  or are there local factors that constrain which version is preferred?
Second, this account ignores the fact that early usage of certain constructions is influenced by the lexical items involved (Tomasello, 2006), raising questions about just how abstract the syntax is.
Third, I see a clear distinction between saying that a child has the potential to learn any grammar, and saying that the child has available all grammars from the outset, “with all parameters valued”. I’m happy to agree with the former claim (which, indeed, has to be true, for any typically-developing child), but the latter seems to fly in the face of evidence that the infant brain is very different from the adult brain, in terms of number of neurons, proportion of grey and white matter, and connectivity.  It’s hard to imagine what the neural correlate of a “valued parameter” would be. If the “full array of languages” is already available in the neonate, then how is it that a young child can suffer damage to a large section of the left cerebral hemisphere without necessarily disturbing the ultimate level of language ability (Bishop, 1988)?
Are there things that only a Chomskyan account can explain?
Progress, of course, is most likely when people do disagree, and I suspect that some of the psychological work on language acquisition might not have happened if people hadn’t taken issue with being told that such-and-such a phenomenon proves that some aspect of language must be innate.  Let me take three such examples:
1. Optional infinitives.  I remember many years ago hearing Ken Wexler say that children produce utterances such as “him go there”, and arguing that this cannot have been learned from the input and so must be evidence of a grammar with an immature parameter-setting.  However, as Julian Pine pointed out at the same meeting, children do hear sequences such as this in sentences such as “I saw him go there”, and furthermore children’s optional infinitive errors tend to occur most on verbs that occur relatively frequently as infinitives in compound finite constructions (Freudenthal et al., 2010).
2. Fronted interrogative verb auxiliaries. This is a classic case of an aspect of syntax that Chomsky (1971) used as evidence for Poverty of the Stimulus – i.e., the inadequacy of language input to explain language knowledge. Perfors et al (2010) take this example and demonstrate that it is possible to model acquisition without assuming innate syntactic knowledge. I’m sure many readers would take issue with certain assumptions of the modelling, but the important point here is not the detail so much as the demonstration that some assumptions about impossibility of learning are not as watertight as often assumed: a great deal depends on how you conceptualise the learning process.
3. Anaphoric ‘one’. Lidz et al (2003) argued that toddlers aged around 18 months manage to work out the antecedent of the anaphoric pronoun “one” (e.g. “Here’s a yellow bottle. Can you see another one?”), even though there was insufficient evidence in their language input to disambiguate this. The key issue is whether “another one” is taken to mean the whole noun phrase, “yellow bottle”,  or just its head, “bottle”. Lidz et al note that in the adult grammar the element “one” typically refers to the whole constituent “yellow bottle”. To study knowledge of this aspect of syntax in infants, they used preferential looking: infants were first introduced to a phrase such as “Look! A yellow bottle”. They were then presented with two objects: one described by the same adjective+noun combination (e.g. another yellow bottle), and one with the same noun and a different adjective (e.g. a blue bottle).  Crucially, Lidz et al claimed that 18-month-olds would look significantly more often to the yellow (rather than blue) bottle when asked “Do you see another one?”, i.e., treating “one” as referring to the whole noun phrase, just like adults. This was not due to any general response bias, because they showed the opposite bias (preference for the novel item) if asked a control question “What do you see now?” In addition Lidz et al analysed data from the CHILDES database and concluded that, although adults often used the phrase “another one” when talking to young children, this was seldom in contexts that disambiguated its reference.
This study stimulated a range of responses from researchers who suggested alternative explanations; I won’t go into these here, as they are clearly described by Lidz and Waxman (2004), who go carefully through each one presenting arguments against it. This is another example of the kind of work I like – it’s how science should proceed, with claim and counter-claim being tested until we arrive at a resolution. But is the answer clear?
My first reaction to the original study was simply that I’d like to see it replicated: eleven children per group is a small sample size for a preferential looking study, and does not seem a sufficiently firm foundation on which to base the strong conclusion that children know things about syntax that they could not have learned. But my second reaction is that, even if this replicates, I would not find the evidence for innate knowledge of grammar convincing. Again, things look different if you go beyond syntax. Suppose, for instance, the child interprets “another one” to mean “more”. There is reason to suspect this may occur, because in the same CHILDES corpora used by Lidz, there are examples of the child saying things like “another one book”.   
On this interpretation, the Lidz task would still pose a challenge, as the child has to decide whether to treat “another one” as referring to the specific object (“yellow bottle”), or the class of objects (“bottle”). If the former is correct, then they should prefer the yellow bottle. If the latter, then there’d be no preference. If uncertain, we’d expect a mixture of responses, somewhere between these options. So what was actually found?  As noted above, children given the control sentence “What do you see now?” there was a slight bias to pick the new item and so the old item (yellow bottle) was looked at for only an average of 43% of the time (SD = 0.052). For children asked the key question: “Do you see another one?” the old item (yellow bottle) was looked at on average 54% of the time (SD = 0.067). The difference between the two instruction types is large in statistical terms (Cohen’s d = 1.94), but the bias away from chance is fairly modest in both cases.  If I’m right and syntax not the most crucial factor for determining responses, then we might find that the specific test items would affect performance:  e.g., a complex noun phrase that describes a stable entity (e.g. a yellow bottle) might be more likely to be selected for “another one”  than an object in a transient state (e.g. a happy boy). [N.B. My thanks to Jeffrey Lidz who kindly provided raw data that are the basis of the results presented above].
Points of agreement – and disagreement – between generative linguists and others
The comments I have received give me hope that there may be more convergence of views between Chomskyans and those modelling language acquisition than I had originally thought. The debate between  connectionist ‘bottom up’ and Bayesian ‘top down’ approaches to modelling language acquisition highlighted by Jeff Bowers (4th Sept) and described by Perfors et al (2011) gets back to basic issues about how far we need a priori abstract symbolic structures, and how far these can be constructed from patterned input. I emphasise again that I would not advocate treating the child as a blank slate. Of course, there need to be constraints affecting what is attended to and what computations are conducted on input. I don’t see it as an either (bottom up)/or (top down) problem.  The key questions have to do with what top-down constraints are and how domain-specific they need to be, and just how far one can go with quite minimal prior specification of structure.
I see these as empirical questions whose answers need to take into account (a) experimental studies of child language acquisition and (b) formal modelling of language acquisition using naturalistic corpora as well as (c) the phenomena described by generative linguists, including intuitive judgements about grammaticality etc. 
I appreciate the patience of David Adjer (Sept 11th) in trying to argue for more of a dialogue between generative linguists and those adopting non-Chomskyan approaches to modelling child language.  Anon (Sept 4th) has also shown a willingness to engage that gives me hope that links may be forged between those working in the classic generative tradition and others who attempt to model language development.  I was pleased to be nudged by Anon (4th Sept) into reading Becker et al (2011), and agree it is an example of the kind of work that is needed: looking systematically at known factors that might account for observed biases, and pushing to see just how much these could explain. It illustrates clearly that there are generative linguists whose work is relevant for statistical learning. I still think, though, that we need to be cautious in concluding there are innate biases, especially when the data come from adults, whose biases could be learned. There are always possible factors that weren’t controlled – e.g. in this case I wondered in this case about age of acquisition effects (cf. data from a very different kind of task by Garlock et al, 2001).  But overall, work like this offers reassurance that not all generative linguists live in a Chomskyan silo - and if I implied that they did, I apologise.
When Chomsky first wrote on this topic, we did not have either the corpora or the computer technology to simulate naturalistic language learning. It still remains a daunting task, but I am impressed at what has been achieved so far.  I remain of the view that the task of understanding language acquisition has been made unduly difficult by adopting a conceptualisation of what is learned that focuses on syntax as a formal system that is learned in isolation of context and meaning.  Like Edelman and Waterfall (2007) I also suspect that obstacles have been created by the need to develop a ‘beautiful’ theory, i.e. one that is simple and elegant in accounting for linguistic phenomena. My own prediction is that any explanatorily adequate account of language acquisition will be an ugly construction, cobbled together from  bits and pieces of cognition, and combining information from many different levels of processing. The test will ultimately be if we can devise a model that can predict empirical data from child language acquisition. I probably won’t live long enough, though, to see it solved.
References
Becker, M., Ketrez, N., & Nevins, A. (2011). The surfeit of the stimulus: Analytic biases filter lexical statistics in Turkish laryngeal alternations. Language, 87(1), 84-125.
Bernal, S., Dehaene-Lambertz, G., Millotte, S., & Christophe, A. (2010). Two-year-olds compute syntactic structure on-line. Developmental Science, 13(1), 69-76. doi: 10.1111/j.1467-7687.2009.00865.x
Bishop, D. V. M. (1988). Language development after focal brain damage. In D. V. M. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 203-219). Edinburgh: Churchill Livingstone.
Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry, 36(1), 1-22.
Edelman, S., & Waterfall, H. (2007). Behavioral and computational aspects of language and its acquisition. Physics of Life Reviews, 4, 253-277.
Freudenthal, D., Pine, J., & Gobet, F. (2010). Explaining quantitative variation in the rate of Optional Infinitive errors across languages: A comparison of MOSAIC and the Variational Learning Model. Journal of Child Language, 37(3), 643-669. doi: 10.1017/s0305000909990523
Garlock, V. M., Walley, A. C., & Metsala, J. L. (2001). Age-of-acquisition, word frequency   and neighborhood density effects on spoken word recognition: Implications for the development of phoneme awareness and early reading ability. Journal of Memory and  Language, 45, 468-492.
Lidz, J., Waxman, S., & Freedman, J. (2003). What infants know about syntax but couldn't have learned: experimental evidence for syntactic structure at 18 months. Cognition, 89(3), 295-303.
Lidz, J., & Waxman, S. (2004). Reaffirming the poverty of the stimulus argument: a reply to the replies. Cognition, 93, 157-165. 
Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The learnability of abstract syntactic principles. Cognition, 118(3), 306-338. doi: 10.1016/j.cognition.2010.11.001
Perfors, A., Tenebaum, J. B., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37(3), 607-642. doi: http://dx.doi.org/10.1017/S0305000910000012
Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The learnability of abstract syntactic principles. Cognition, 118(3), 306-338. doi: 10.1016/j.cognition.2010.11.001
Redington, M., Chater, N., & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4), 425-469.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926-1928.
Seidenberg, M. S., & MacDonald, M. C. (1999). A probabilistic constraints approach to language acquisition and processing. Cognitive Science, 23(4), 569-588.

Tomasello, M. (2006). Acquiring linguistic constructions. In R. Siegler & D. Kuhn (Eds.), Handbook of child psychology (pp. 1-48): Oxford University Press.
 
P.S. 15th October 2012
I have added some links to the response of 12th October. In addition, I have discovered this book, which gives an excellent account of generative vs. constructivist approaches to language acquisition:
Ambridge, B., & Lieven, E. V. M. (2011). Child Language Acquisition - Contrasting Theoretical Approaches: Cambridge University Press.