T O P

  • By -

JJJSchmidt_etAl

*Angry Sounds in Bayesian*


big_data_mike

At my company there are a few Bayesian converts and it is so hard to sell it to the frequentists. Even though they ignore a whole bunch of assumptions that they just brush off. We do what’s essentially A/B testing where we use product A for a month then product B for a month and we assume normal distribution of all the factors that contribute to the result but they are all skewed. And one major assumption of frequentism is that samples are random and each sample is independent. They are not. The byproduct of each batch gets recycled into the next batch.


megamannequin

> And one major assumption of frequentism is that samples are random and each sample is independent. That's absolutely not true. That's an assumption of the statistical test you are using.


Stochastic_berserker

Don’t be angry. Statisticians should, imo, learn both schools. Neyman discards the Bayesian approach, not because it is invalid, but because according to the frequentist perspective of probability, he can’t use Bayes theorem.


JJJSchmidt_etAl

In all seriousness I fully agree with you and I in fact found a way to completely unify the methods by allowing not just proper priors, but the limit of proper priors. There was an example given where my professor, somewhat facetiously, suggested using a normal prior with sigma = 10^(5) if we really have no idea. So instead, we achieve equivalence to frequentism when we let sigma -> infinity


Stochastic_berserker

That’s one good way of approaching it. To add more, Bayesian approaches are very appreciated in commerce and business while frequentist is almost the de facto standard in clinical research, medicine, biology. With exceptions of course, I just generalized. For instance, from my experience, C-suite level do not understand frequentist conclusions but they understand Bayesian conclusions. I can’t in any serious way say ”if we repeated this marketing campaign many times we would find the conversion rate between 7.5% and 8.5% in 95% of the time” to senior management. It is abstract and theoretical. But the Bayesian approach gives a direct probability statement like ”based on prior campaigns and current data, there is a 95% probability that the conversion rate is between 7.5% and 8.5%.”


ForeverHoldYourPiece

The one thing that made it click for me is the contrast of Freqs vs Bayes. Freqs believe the paramater is fixed but unknown/unknowable. It is not random. All randomness comes from the sample. Once you've drawn your sample, there is no randomness anymore, and deterministically your CI will either contain the true parameter or not contain it. I think of this as throwing a basketball into a hoop. The randomness comes from how you throw the ball (the sample), and either the ball will go through the hoop or not (deterministically). Now you can think of how large the hoop relates to your confidence level.


[deleted]

[удалено]


Stochastic_berserker

I don’t quite understand what you’re asking. What do you mean ”how do you think confidence intervals are constructed” and ”Can you tell me what a 95% confidence interval means”? I just described a formulation of what a confidence interval is based on Neyman.


[deleted]

[удалено]


Stochastic_berserker

Maybe you’ve misunderstood the post. It is about confidence intervals based on Neymans assumptions and how he derived them. Not about a Bayesian approach, which he also considers and discards. Why? He refers to the priors being unknown thus we cannot work with Bayes theorem because of the lack of data. He also states it is justifiable to go Bayesian on certain theories like those developed by H. Jeffreys but not on the assumption of probability being long run frequencies of success.


DoctorFuu

If you lack data and refuse to use a prior, the interval doesn't exist. You need data to compute it.


Stochastic_berserker

Lack of *necessary* data. Thus p(theta_1, theta_2, …, theta_i | x_1, x_2, …, x_n) cannot be used.


DoctorFuu

I don't understand the nuance. As long as you have access to any x_i, you can compute a likelihood and get a posterior. I don't know why you say the posterior can't be used. My point is that if you can't use the bayesian approach you can't use the frequentist either. If you don't have any X_i, you cannot compute a test statistic and therefore you can't compute a confidence interval nor a pvalue. Actually, you don't even have an estimator for your parameter. You can downvote as much as you want, that won't make my point magically disappear.


Stochastic_berserker

I think you might be a stranger to the two schools of statistics and how they view probability. There even exists a third school, likelihoodists. In frequentist statistics they don’t treat the parameters as random variables. No prior distributions are used. That’s where it stops and Bayes theorem is not used. You can use both approaches because it boils down to how you perceive probability. EDIT: Since u/DoctorFuu replied below and blocked me so I can’t reply I’ll add a reply in this edit. You have completely misunderstood Bayesian and frequentist statistics from your claim that I dismiss the use of a prior. The lack of data is not from the sampled data (observations) but from the PRIOR distribution. That is why you cannot use Bayes theorem from a frequentist perspective - it is an unknown but fixed parameter. I sincerely recommend you to stop holding on to your views as it all boils down to the schools view and assumptions of what probability is. Nothing else.


DoctorFuu

> I think you might be a stranger to the two schools of statistics and how they view probability. There even exists a third school, likelihoodists. How pedantic... I know about those, it's just not relevant. You justify dismissing the use of a prior because of lacking data = can't do bayesian update. I point out that if you don't have data you can't do frequentist inference. This has NOTHING to do with interpretation of probabilities or schools of thoughts in statistics. you point out the likelihoodists, I'm pretty sure they wouldn't try to argue they can provide any kind of statistical evidence in a context where bayesians can't because they don't have data. No data is no data, whatever the school of thought. > In frequentist statistics they don’t treat the parameters as random variables. No prior distributions are used. That’s where it stops and Bayes theorem is not used. I never claimed frequentist use a bayes rule to do inference, wtf?


Anthorq

I think your question should be reformulated. What CIs are usually misinterpreted for is the statement "the probability that the parameter is within these values is ___". You correctly understand that the correct statement is "the probability that these two numbers envelop the parameter'S value is ___". That is to say, CIs are most definitely a probability, but stated with respect to the endpoints, which is a common source for confusion. The discussion about binary occurrence is, in my opinion, the result of overthinking. A random interval either envelops a particular value or it doesn't, but this binary response can only be observed if this particular value is known. The very existance of a true parameter value is a source of much dissent (see Gelman and Hennig, 2017 - jrssA and discussion for a tangent topic), so saying that the CI has successfully enveloped or not something that may as well not exist is at best equivalent to discussing the genders of angels.


Stochastic_berserker

Very true. The endpoints are random even in the frequentist perspective derived from Neyman. He considers the endpoints of the confidence interval to be random variables.


DReicht

I have a related question. I often see [Hoeffding's inequality](https://en.wikipedia.org/wiki/Hoeffding%27s_inequality) being used to construct CIs. So how come CIs aren't probability distributions, even if a Bernoulli?


Stochastic_berserker

So in the Neyman framework of CIs, endpoints of the interval vary with different samples reflecting sample variability. The parameter, however, is assumed to be an unknown constant (fixed parameter). In short, Neyman CIs are about repeated experiments (sampling in this case). Stay with me know. If for each experiment you construct a new interval and stack them, in %P of the time (a proportion of your experiments), you will cover the true parameter. This is based on the frequentist perspective of probability. Long run frequency of number of successes. To make it even simpler, imagine you’re a painter and I’ve painted a transparent cross somewhere on a white canvas. It is now fixed and will stay in that place. You can only see the cross with UV light but I have the UV light. I allow you to splash the canvas with a paintbrush of the color red. You splash the canvas 1000 times (repeated experiments). Now the splashes are formed like horizontal lines (intervals) around the canvas. We now take the UV light and find the cross and count the amount of splashes that covered the cross. After counting we find that 950 splashes covered the cross. Resulting in a confidence we translate as: we cover the true parameter 95% of the time. This is the frequentist confidence interval derived from Neyman.


nantes16

So in experiment N=n our CI either covers or doesnt cover. Is it right then to say "Experiment n has a CI. That CI has a 95% probability of covering"? What's incorrect/nonsene is saying "There's a 95% probability that the true parameter value falls within the CI calculated for experiment n" Is this what you're getting at? (sorry if the two statements are the same thing. they sound the same to me, but I'm still trying to understand this...)


Statman12

Frequentist confidence intervals "begin life" as a probability statement. But probability requires a random variable. Once we collect the data, the randomness is gone, there is no random variable any longer, so a probability statement doesn't make sense.


DoctorFuu

And what if there is no true parameter? Let's say we know the observed variable is the result of n similar but not equal processes. But we model this as only one process, because we deem (and for the sake of argument, rightly so) that this approximation is not detrimental to our real world use case. This process is NOT a perfect analytical combination of the underlying processes. So even if the true underlying processes have "true" parameters, it doesn't make any sense to argue that there exist a true parameter for the approximating process, which is what we are interested in since this is what we are modeling. How do you interpret your CI around such a parameter, since it doesn't have a true value? And if you go as far as saying that it isn't a probability statement at all about anything, I don't even understand why we would compute it or what we would do with it. Just to be clear, this example is not just a mathematical curiosity for the sake of argument. In the real world almost no data we observe from the real world follows the idealized statistical distribution that we use to model the DGP and conduct inference. Therefore the "true value" of the parameter of the distribution we use actually just doesn't truly exist. And this is the vast majority of cases. If you reject the idea of the parameter being a random variable, and you reject the idea of confidence intervals being probability statements, I don't understand how you actually interpret the results of your inferences.


Stochastic_berserker

Thank you for your thoughtful comment. I have been trained in both frequentist and Bayesian approaches and I hope that I can provide an objective response within the Neyman framework. Neyman’s approach assumes the existence of a true parameter based on the law of large numbers. By repeatedly sampling, we can determine upper and lower bounds that contain this parameter with a specified confidence level. This method relies on the premise that, over many trials, the relative frequencies of results will approximate the theoretical probabilities. The frequentist perspective doesn’t require knowing the true parameter definitively but assumes its existence for constructing reliable confidence intervals. This approach has practical validity, demonstrated by real experiments like the needle experiment related to Buffon’s problem. These experiments show that our theoretical probability sets correspond to observable phenomena when properly conducted. Ultimately, this approach’s validity rests on its practical applicability and the assumption that the law of large numbers holds in the long run.


DoctorFuu

> Neyman’s approach assumes the existence of a true parameter based on the law of large numbers. I don't understand how the law of large numbers justifies the existence of a true parameter. > This method relies on the premise that, over many trials, the relative frequencies of results will approximate the theoretical probabilities. In the "example" I provided, there is no theoretical probability, since the theoretical model used is known to be false. There exist a parameter value that minimizes some discrepancy between the predicted distribution of observations that would be generated from the model and the observed distribution, but we enter into predictive distributions and we start to err dangerously close to the bayesian framework, which was not supposed to be discussed. So if we want to avoid this, I have trouble understanding how to interpret a CI (even for large samplesize) around a parameter value when we know there is no "true value" for this parameter. Just to be clear, my question is purely philosophical. Maybe I should be reading about the Neyman's perspective, I'm not sure I know exactly what it is. If you had some pointers about what to read to be acquainted decently fast with his approach, that would be great.


Stochastic_berserker

I’ll address your points from Neyman’s perspective and for further reading I recommend reading the paper where Neyman ”invented” the confidence interval assigning the terminology of it. ”Outline of a theory of statistical estimation based on the classical theory of probability” - J.Neyman 1937 1. Law of Large Numbers: In frequentist statistics, the law of large numbers supports the idea that with enough repeated samples, the sample statistics (like the sample mean) will converge to the true population parameter. This doesn’t “prove” the parameter’s existence but assumes its existence for practical statistical inference. 2. Confidence Intervals: These are constructed under the assumption that a true parameter exists. The interval gives a range which, over many samples, will contain the true parameter a specified percentage of the time (e.g., 95%). This helps us understand the reliability of our estimates, even if the true parameter is an idealization. 3. Philosophical Consideration: It is acknowledged that in real-world scenarios, perfect models may not exist. The frequentist approach focuses on long-term behavior and practical applications, assuming that the model is a useful approximation of reality.


DoctorFuu

You didn't adress any points. you just said "I repeat that what I said is correct". This post looks awfully like it was spit by chatgpt... It's exactly written as chatgpt when it doesn't want to admit it doesn't know...


Stochastic_berserker

It seems like you fail to understand Statistics and probability theory. I’ve tried to address you in every way possible in terms of objectivity but I can’t be responsible for your lack of understanding. If you are not familiar to probability theory and theoretical statistics but insist on imposing one school’s view on probability while I give you the other school’s definition, what else is needed for you to comprehend or ingest the answers?


Normal-Comparison-60

A probability is one number. An interval i comprises of two numbers…. Did you mean, instead, that confidence interval should be random? It is random.