Archive for category: Uncategorized

What you should read before starting your PhD

July 17th, 2008 by jose

Here is one of the best summaries of how things can go wrong when one chooses to follow the academic path. I got this from Hacker news. The author of this well-written piece came from the industry, and compares the world he knows with what he encountered at the academia.

Things he finds:

  1. Doing a PhD is lonely
  2. Your picking the right advisor will determine your happiness level more than anything else
  3. The way you code within the academic world has nothing to do with the way people code in the
    industry

But maybe we already know this.

What I’d like to see is someone writing a similar piece on life after your PhD. I had this silly idea that things would be easier and I’d have more time after my PhD Thesis for… you know, hobbies and other stuff normal people do. Nothing farther from reality.

Who needs theories when one has lots of data?

July 8th, 2008 by jose

This article poses an interesting question. Sometimes one has enough data to make accurate predictions without having an understanding of what causes the phenomenon (a model). Nowadays, it’s getting easier and easier to get huge datasets, which are often sufficient to do this.

For example… Google uses massive amounts of misspellings to give ‘on the fly’ corrections. It also uses massive corpora of bilingual texts, such as their French/English translation engine by feeding it Canadian documents which are often released in both English and French versions. But they don’t have any theory of language doing smart stuff in the background.

So are theories redundant, or obsolete, in a world where one can do proper predictions without them?

Wired’s own Chris Anderson explores the idea:

Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show.

The point here is that statistics can find patterns in basically any area; so maybe we don’t need an specific science to take care of those problems.

There are issues with this line of thinking. Of course, correlation doesn’t imply causation, so doing just this we’d be blind to cause-effect relationships:

Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.

Comments by Deepak:

We all know that more data means new approaches to science, especially since this has happened so quickly.

We’ve always worked with partial understanding, or in the case of medicine, less than partial understanding, but that’s precisely why medicine is beginning to fail. Not knowing mechanisms, etc is what results in a VIOXX. Not knowing why is what creates the next disaster.

Trying to solve the exact same problems as Google, we have a camp that does think that knowing ‘why’ is important: the semantic web proponents. Under this paradigm, the web would become a huge ontology. And machines would operate with propositions (RDF triplets) to deduce new knowledge. In this case, you do know how the machine reached certain conclusion. They do face the same huge datasets (i.e., try to operate with ‘the entire web’ at some point; not now, since only a small fraction of the sites use RDF at all), but instead of using the raw content that is prepared for human consumption, they will use machine-ready content.

If after plowing though petabytes of data, a semantic search engine reaches an interesting conclusion, at least it can show us the logical path it used. The promise for pharmaceutical companies is that they could find new drugs and interactions by just letting the algorithms traverse a corpus of, say, proteins. But, again, in this case, there is no ‘human’ postulating a theory either.

Probably, what all this means is that we scientists will need to adapt our methods to collaborate with these smart machines. There are things, like deep search, that are better left to them; whereas some other, like tagging images, are really hard for machines but trivial for humans.

More on drugs that supposedly give you mental superpowers

May 15th, 2008 by jose

Just as a quick follow-up to this post, … there seems to be a narcolepsy drug that works really well for periods when you need a lot of concentration.

The drug name is provigil. The article is a pretty hard-core testimonial on its effect. There’s an interesting discussion here. The article seems to mention no negative side effects (other than making you eat less!) but in the discussion some people mention serious stuff like : “nervousness, insomnia, excitation, irritability, tremors, dizziness and headaches“.

fileHamster: easily keep versions of your manuscripts

May 4th, 2007 by jose

Filehamster monitors changes to any kind of files, and keeps versions. I have seen that many academics just use a numbering system in the filename; filehamster is a bit more elegant.

If you are a programmer, you may know about versioning systems. They are convenient for large projects , particularly for those with more than one programer. However, they are not as easy to use as to justify the overhead when doing simple manuscripts.

Keeping versions helps the flow of writing, since no matter how much you mangle your manuscript, you can always go back to a previous version. And of course you can add comments to versions. Filehamster stays in the system tray and is pretty unobtrusive.

Filehamster is updated often, and each version fixes most of the annoyances of the previous one while adding features.

Camera Photocopying

February 8th, 2007 by shane

I don’t need to do much photocopying these days, as my trips to the library are rather infrequent. However, if do venture that way I bring along my compact digital camera. I used to have a 3 megapixel camera phone, but darn it, lost it, so its my trusty casio instead, and avoid those long photocopying queues. Rather than photocopying chapters or journal articles, I now just photograph them. Its free of charge, its quicker, I don’t have to find a photocopier, and I end up with a digital copy which I can read on my computer. print, or even run OCR if I was so inclined. 

(more…)