Yenya's World

Statistics Problem

Is there any statistician reading this blog? Can you recommend any statistics community (web forum, mailing list, anything) where I can ask questions about one problem I am currently trying to solve? For those with login to IS MU, the description will for some time be also in the discussion forum of Faculty of Science. The problem is this:

I have a random variable with probability of exp(-a*t) for some constant a and the time t (think radioactive decay, but the real problem is something different). The problem is to calculate the constant a from the observed data.

The measurements I have are in the form of a set of pairs (ti, + or -), with the following meaning: At time 0, take a brand new "i-th atom", verify that it is not decayed, wait for the time ti, and look at it again. If the atom has not decayed yet, add a (ti, +) pair to the set of measurements. Otherwise, add (ti, -). Continue with the next new atom an the next time ti+1.

Note however, that the times t_i are given to me from the outside, I cannot choose them, and they do not have any particular distribution (e.g. being equally distributed between time of zero and some large number). Also, the number of measurements is quite small (several hundreds at most).

You can download a Perl script for generating the test data, the test data (100 rows), and the large test data (10,000 rows) generated by this script. Can you somehow compute which constants a have been used when generating these sets of data? If so, how could it be done? And how can I estimate how accurately the exp(-a*t) curve fits the real data?

4 replies for this story:

J.C. wrote:

I haven't studied your problem to such an extent that I fully understand it, so just an idea: how about applying the logarithm function to the data (y-values)? This could make things a bit easier, as it should transform the data with some-obscure-exponential-dependency to linear-like dependency.

Yenya wrote: Re: logarithms

Yes, log(y) transforms the exponential regression to a linear one. However, log(what?). I have a + or - (i.e. 1 or 0) values. So which points to apply the linear regresson to? 0 and -infinity does not sound feasible to me (remember, the data is relatively sparse and even with discretisation of the intervals you get the intervals with only - values.

Luinar wrote: Solving

Sorry that I won't bother with English but in Czech it will be much faster to explain the solution. Pro každý čas t_i si můžeš spočíst pravděpodobnost jestli dostaneš + nebo -: P(t_i,+) = exp ( -alpha t_i ) P(t_i,-) = 1 - exp ( -alpha t_i) Tj. pro každou částici ve výsledku jsi nyní schopen spočíst její pravděpodobnost jako funkci parametru alfa. No a vzhledem k tomu, že dané částice jsou nezávislé pak, pravděpodobnost toho, že dostaneš tato konkrétní data je součin pravděpodobností od jednotlivých částic. Příklad: Výsledky: 1 + 2 - 3 - 4 + 5 - Pravděpodobnost daného výsledku je: exp(-alpha) [1-exp(-2*alpha)] [1-exp(-3*alpha)] exp(-4*alpha) [1-exp(-5*alpha)] No a parametr alpha hledáš takový, aby maximalizoval tuto pravděpodobnost. Po drobných úpravách a hraní si s tím (kvůli kompaktnosti = není to jediný možný zápis) dostaneš analytický vzorec (LaTeX konvence): \sum\limits_{i=1}^N t_i \frac{ \mathrm e^{-\alpha t_i} - h_i }{ 1 - \mathrm e^{-\alpha t_i}} = 0 kde N je počet atomů, t_i jsou časy jednolivých pozorování a h_i jsou výsledky ve tvaru 0 pro -, 1 pro +. Tahle rovnice se už musí řešit numericky, jako hint je dobré si všimnout, že suma jako celek je funkcí klesající v alpha a tedy půjde to dobře řešit půlením intervalu nebo nějakou pokročilejší gradientní metodou. Pokud budou další dotazy směřujte je kdyžtak na můj mail ("Moje přezdívka" na Seznamu - k okénku na vyplnění nemám důvěru neb nevím jestli se nezobrazí veřejně a v idealním tvaru pro spamboty).

Yenya wrote: Re: Solving

Your solution is right. I am sorry, I forgot to update this blog entry, but I already had this solution - a friend of my colleague solved it about a day or two after the above blog post has been published. But thanks anyway - I hope it has been a nice mental exercise for you :-) Now the other problem is the last sentence from the blog post: when the data is a bit noisy, I would like to estimate how well the probability fits to the exponential curve. Of course, some estimate can be taken from the probability from your blog post, but the value of this highly depends on both number of measurements, and t_i values. I would like to have something which could be used for comparing even data sets with different number of measurements and/or different distributions of t_i values.

 Name: URL/Email: [http://... or mailto:you@wherever] (optional) Title: (optional) Comments: Key image: (valid for an hour only) Key value: (to verify you are not a bot)

Pragocentrism

I live in a country with population of about 10 milion, with the capital Prague with about 1 milion inhabitants. Today's rant will be about narrow-minded journalists living and working in Prague.

I frequently ran into a blatant cases of pragocentrism. For example in almost every traffic news in a country-wide and state-funded radio station Radiožurnál they use formulations like this: "there is an accident in the Brno motorway in a direction to Brno". WTF? Which of the three motorways heading to Brno do they mean? The D1 from Ostrava? The D2 from Bratislava? No, of course they report from the perspective of people living in Prague, so naturally with "the Brno motorway" they mean "the motorway from Prague to Brno".

Another one was a few days ago, also on Radiožurnál. They were doing an interview with a candidate for the minister of the interior (who currently works as the head of the anti-monopoly office, the institution located in a barren countryside far away from Prague, namely in Brno :-). The first question was "Have you already get used to living in Brno instead of Prague?". Mr. Pecina replied something like: "I don't understand the question - I am from Frýdek-Místek, I have been living there for almost all of my life, except only one short stay in Prague.". The journalist had naturally expected that every important person must have been from Prague. That said, the journalist was really stupid anyway and she manifested it several other times during that interview.

Another case of Pragocentrism is more general. In the main news of the Czech TV (also state-funded), they often report about Prague-local things (such as some affairs of mayor of some part of Prague or even of a mayor of Prague, building some tunnel or some stadium in Prague) during the main part of the news, even though they have a separate part "news from the regions". Also when doing a coverage of a country-wide event such as elections, they report about the situation in Prague, and then they say something like "and now we will look into the regions". WTF? Prague is not a region? Why should the Prague-local news be forced to us by state-funded media as something important?

Jan wrote:

It's a custom to call the highway after the major town on it. I never heard something like "the Prague highway", but usual terms are "brněnská", "plzeňská", "mladoboleslavská" or "královéhradecká" and everyone knows which one it is.

 Name: URL/Email: [http://... or mailto:you@wherever] (optional) Title: (optional) Comments: Key image: (valid for an hour only) Key value: (to verify you are not a bot)

HTML <button> Tag

So I wanted to upgrade the form we use in IS MU in many places for selecting a printer, splitting the "print" and "download PDF" functionality to separate buttons. The problem is how to make it as backward-compatible as possible.

I basically wanted to have two buttons with the same name="..." attribute, and distinguish between them by their value="..." attribute. I have came across the cool new (for me anyway :-) HTML tag <button>, which does exactly what I want. I am able to use my own machine-readable value="...", and put the button label (localized) inside the <button>...</button>.

Except that it does not work in MSIE. That parody of a browser does not send back the value="..." attribute contents, but rather inner text of the <button> tag for all buttons in the form, not just for the actually clicked one. Stupid MSIE, die already.

2 replies for this story:

mirka wrote: finally

Stupid MSIE, die, finally Stupid MSIE, you're dead already, go away ?

 Name: URL/Email: [http://... or mailto:you@wherever] (optional) Title: (optional) Comments: Key image: (valid for an hour only) Key value: (to verify you are not a bot)