A large number of hedge fund analysts are data scientists. Just because computat...

JumpCrisscross · on Aug 26, 2018

> a large number of hedge fund analysts are data scientists

Maybe I'm being a curmudgeon. In my book if you can't build the statistical tool you're using, you don't understand it. So if, in Excel, you can fit a regression from "scratch", (i.e. not using any built-in regression functions) and use the built-in functions for convenience, that's fine. If you can't, you're a regular financial analyst. (Nothing wrong with that. I was one once.)

This is important because being able to build it means being able to tweak it. Excel's tools have quirks and make built-in assumptions about your data. If those assumptions don't hold, you should be able to tweak (or change) your approach. Being limited to built-in models removes that flexibility. It also implies you don't know when you're crossing between "my tool works" and "my tool is outputting garbage."

tomnipotent · on Aug 26, 2018

> not using any built-in regression functions

This is absolute pure silliness. So the guy down the street that can't manufacture a new carburetor isn't a mechanic? The doctor that can't create penicillin in his personal office isn't a medical professional?

More often than not, the people that create new systems are not the same people that take these concepts and apply them to practical business problem. It takes many kinds of people to help a company grow.

celrod · on Aug 26, 2018

Hadley Whickam also stressed the importance of readability and reproducibility of code vs software like Excel. A sequence of manipulations and button presses doesn't leave a clear history for others in your team to understand or quickly test and reproduce your work to validate, or apply to be data sets.

For work to be "science", it really ought to be transparent and reproducible within your community (eg, team within your firm).

sonofaragorn · on Aug 27, 2018

This is an important point too. I'm still amused by the story of the "Growth in a Time of Debt" paper. Two famous Harvard economist made an Excel mistake (among other things) that completely changed the conclusion of their study and ultimately affected real-world economic policies.

Here is a summary of the story: https://www.nytimes.com/2013/04/19/opinion/krugman-the-excel...

compcoffee · on Aug 26, 2018

>So if, in Excel, you can fit a regression from "scratch", (i.e. not using any built-in regression functions) and use the built-in functions for convenience, that's fine.

I suppose it depends on what you mean by "can". Much like whiteboard interviewing, I bet I can't sit down today and implement a regression from scratch, because I'm out of practice. I have had no need to retain that information; there are tools that do it for me.

Could I do it quickly with a stats 101 textbook in my lap (or 10 minutes of Googling)? Absolutely.

kgwgk · on Aug 26, 2018

I don’t think most “data scientists” can do b = (X′X)^(−1) X′ Y right off the top of their heads either.

curiousgal · on Aug 26, 2018

Let's take a simple example, linear regression. You can fit a linear model with w few button clicks in Excel and then look at R² to check if it's high enough and call it a day. Whereas in R you'd be able to easily check a bunch of other metrics to assess your model (ANOVA, plot the residuals, etc). When there's a lot at stake, you want to make the decision yourself and not just rely on the output of a black box.

nostrebored · on Aug 26, 2018

All of the things you're taking about are possible to do with Excel. Of course I prefer R, but at some level we all abstract away algorithmic responsibility and return granularity.

world2vec · on Aug 26, 2018

You can also easily obtain summary statistics and ANOVAs, etc, in Excel with a few clicks.

JumpCrisscross · on Aug 26, 2018

> You can also easily obtain summary statistics and ANOVAs, etc, in Excel with a few clicks

To be clear, my original comment was not criticizing Excel. It pokes fun at analysts who couldn't tell you a Type I error from a Type II being branded as "data analysts" because they build DCF models.

fma · on Aug 26, 2018

I think a relevant analogy from back in the day is being able to use frontpage and then calling yourself a web developer.

Certainly it has good features to create what you want, and you could edit source (if you can read it!)...but it's used by people who have a elementary knowledge of web development.