Your analysis is extremely flawed. I really appreciate your sharing the data wit...

fnl · on Feb 27, 2017

Here's another, even worse example:

> It turns out that people’s interview language of choice matched their most endorsed language on LinkedIn just under 50% of the time, so, you know, just slightly worse than flipping a coin.

A coin has 2 sides. How many programming languages are there, again?

mulmen · on Feb 27, 2017

I read that as flipping a coin to predict if the most-endorsed language will be the language of choice in an interview. I make no claim to the statistical usefulness of such a statement.

stingraycharles · on Feb 27, 2017

Yeah reading that made me realise the whole article is just spam.

usgroup · on Feb 27, 2017

language of choice == linkedin endorsed lang -> {TRUE,FALSE}

Yet if there was no relationship between the two entities and there were N languages, one would expect the random probability of TRUE to be N/N^2 = 1/N.

Although, the writer doesn't seem to allege that he's comparing to random or anything like that.

leeny · on Feb 27, 2017

Yeah, that's exactly how I meant it. But I agree it's confusing. Will remove the comparison right now.

PeterisP · on Feb 28, 2017

And the variables are measuring different things - you'd expect the most endorsed language to mirror the language that you have the most experience in during your (possibly very long) career, and there's no reason to suppose that it should match your current language of choice (which quite likely is more modern than what you used 10 years ago) more than 50% of the time; no matter how (un)reliable endorsements are, this isn't a valid argument against them.

huac · on Feb 27, 2017

And what's the distribution over the languages, in terms of 'preferred interview programming language'?

leeny · on Feb 27, 2017

People overwhelmingly chose one language to interview in, so there wasn't much fo a distribution.

Unless you mean which languages are most popular?

xapata · on Feb 27, 2017

> which languages are most popular?

Is there any other (reasonable) way to interpret that question?

leeny · on March 1, 2017

Wasn't sure if you were asking that independently of endorsements.

Right now, the most heavily used languages on interviewing.io are Java, Python, JavaScript, and C++ (in that order).

Ruby, C#, and Go are in the middle.

At the bottom are Perl and PHP.

leeny · on Feb 27, 2017

Author here. I appreciate the notes and am happy to revisit and make corrections when needed. To respond to your points:

1. As a sanity check, I did do a t-test of technical ability vs. # of endorsements before publishing. There is no statistically significant relationship between the 2. (P < 0.335)

2. What do you mean by "language matters here" (re the histogram)?

Waterluvian · on Feb 28, 2017

After reading the way in which your work is being criticized (regardless of the validity of the criticisms), I'm very impressed to see you here making lemonade of it all. I'd be crying in the corner if it were me. Good work.

closed · on Feb 27, 2017

Do you mean that you fit a simple linear model, of the form below?

ability = b0 + b1*endorsements + error

And when you say t-test, are you saying you did a t-test for the parameter b1?

Usually when people refer to a t-test, without more information, they are saying they tested the difference of means between two groups. (or one mean against a number).

See, for example, the Wikipedia article on t-tests: https://en.m.wikipedia.org/wiki/Student's_t-test

minimaxir · on Feb 27, 2017

> Do you mean that you fit a simple linear model, of the form below?

That would be the form of the best-fit line in the scatterplot. (and it would make sense to assume that the t-test refers to b1 != 0, as there is only one group)

closed · on Feb 27, 2017

Edit: on second thought, you're probably right. I think I was too off the cuff in responding. Left original response below.

If by best fit you mean minimizing sum squared error, that's fair.

But to be sure, if someone said t-test, and they only had one group, I would first guess they were doing a one-sample t-test.

Even with two dependent variables and one group, I would think over whether they did a dependent t-test.

I figured it was a simple linear model (in this case a correlation) because they mentioned that they tested the relationship, and it makes sense, but it seems important to sanity check the use of the term t-test, which can be highly ambiguous (and I have seen used in very surprising ways).

mendicantB · on Feb 27, 2017

Hope it was helpful.

It's unclear what you t-tested here. Ideally, you would test for difference between groups of "Is there a difference in number of endorsements between people who got a "yes" in advancing to the next round or not". As a followup, is there a difference between those who's preferred was most endorsed or not?

I'm a bit stunned that you didn't recognize Language as programming language...... :(

As an example, people probably get endorsed for SQL or CSS far more than their programming language of choice that is tested in an interview.

leeny · on March 1, 2017

What do you mean re not recognizing language as a programming language? We only counted actual programming languages, i.e. not SQL, HTML, CSS, etc.

Spooky23 · on Feb 27, 2017

IMO, abusing endorsements like this isn't very useful. At best, it's sort of replaces some attribute of a reference. Endorsement is really feedback to validate the field that somebody works in and is probably not a bozo.

If you're interviewing a mid-career programmer, and her endorsements are all for accounting and financial related things, you may have a fit issue to explore. Likewise, if you have a candidate with 200 connections and 3 endorsements, and that's atypical for the industry, you may want to focus more heavily on real reference checking.

limelight · on Feb 27, 2017

Seriously, I'm not sure how they can even pretend to have reached that conclusion. It would require interviewing random LinkedIn users without endorsements (including ones with no technical ability at all) and then rating them.

I certainly think that LinkedIn endorsements aren't exactly a meaningful metric. But they're not meant to be. They're just a simple tool for finding people who can even nominally program.

pmiller2 · on Feb 27, 2017

Does your little birdie have data?

arcticfox · on Feb 28, 2017

Seriously, not sure how you can criticize someone for their methodology and then throw in a super weird appeal to authority with no support

closed · on Feb 27, 2017

Agreed. This post would be okay if there wasn't so much "we crunched the numbers" talk, and then very perplexing reporting.

minimaxir · on Feb 27, 2017

> And please read up on Simspons Paradox, which is clearly the case here just by looking at your plot.

What is the instance of Simpson's paradox (https://en.wikipedia.org/wiki/Simpson's_paradox) in the scatterplot? There are skews on both X and Y axes, but I don't see disparate trends.

Would faceting by preferred language/experience/LinkedIn age show different trends in this context?

mendicantB · on Feb 27, 2017

If you breakup the plot by any number of categories of technical ability, then there are trends. But to your question, what I was suggesting was that the aggregations done, especially with categorical data that is averaged, are very susceptible to this. And those clusters are reminiscent of situations like this:

https://www.researchgate.net/figure/256074671_fig3_Visualiza...

I'd more so like to see this analyzed against who got to the next round (their binary signal), or yes against preferred language, which I suspect will be much more telling.

The takeaway from that plot is, there is more to the story.

nazgul17 · on Feb 27, 2017

As someone interested in data science, I am curious to know more: could you elaborate on your points?

mendicantB · on Feb 27, 2017

First 200 pages

https://www.amazon.com/All-Statistics-Statistical-Inference-...

snoman · on Feb 28, 2017

Sassy, but I'll allow it.