what's a good metric for PhD Students?
NOT Optimising the AI PhD journey for the wrong metric in the mid-2020s
This is my fourth blog (after a long hiatus!) on my Research Experience, check out the previous blogs: Ascending the Research Trail, Why Research if I can Develop?! and Riding the Noisy Research Track on Medium. I also switched to substack for this one coz well, I like the UI of this more.
So, I started my PhD in CS at Georgia Tech in Fall 2023. It’s only been around 16 months, but it feels more like several years in deep learning research trends. During this time, SAM toppled the classical segmentation task research, and the GPT-4 series of models appeared; model alignment (to human expectations) research has taken off, and the Qwen series has (sort of?) taken over the Llama series as the best performing and fast-growing open-source LLMs, LLM based agent frameworks are growing and growing, and numerous (maybe too many) fancy AI startups have popped up.
That’s about research in major (industry) leagues. What about academia?
My observations are based on research in the multimodal learning domain; these might not hold for fields like theoretical ML.
From what I’ve seen, the number of papers from solely academic labs has dropped significantly (based on the 900-odd papers I’ve looked at since June 2023, so please take this statement with a grain of salt). Many papers come from PhD students interning at the industry research labs, so a collaboration b/w industry and academia. An interesting observation here, though, is the disparity between pure industry papers and academic papers. It seems as if academia depends on these industry labs for publishing now, and for a fair reason, because the industry usually has more GPUs.
All this is to say that the industry is dramatically affecting the PhD (research in deep learning) experience nowadays. So, as an international PhD student who spends most of his time at a US academic lab with no long-term collaboration with any industry lab in sight, what should I do to make the most out of a PhD? There are various aspects to this question, to list a few:
How can I focus on producing work I am satisfied with rather than succumbing to the modern publishing frenzy?
Is publishing papers good enough?
How do I not feel too anxious about being unlucky regarding the time of my PhD?
How do I optimize my social network?
In this piece, I’ll try to share my opinions and perspective as a 2nd year PhD student about enduring overthinking and anxiety on an almost daily basis.
The Contemporary Publishing Frenzy
I often find myself reading and analyzing papers (related and unrelated to my research, as one should) from the community. Owing to my reading and slowly growing expertise in multimodal foundational models (and LLMs), I have noticed various trends, not all positive.
Safe and Underperforming: A Bad Combo
I’ve observed the vast presence of what I like to call “safe underperforming” works. By safe underperforming, I mean papers that are doing A+B research but trivially, arguably only to get something out. I probably wouldn’t have mentioned this if I hadn’t encountered a paper (P’) released today that added a chain of thought technique to a paper (P) released in March 2024 by Meta, and P’ didn’t show much improvement. Don’t get me wrong, “safe” works that perform well are good, and “underperforming” works that are not trivial are great. But combine the two, and you have a lousy paper that doesn’t tell the community much unless you add some analysis about the underperformance rather than selling that small gain as a significant gain; this cognitive dissonance or delusion is wrong.
Too many papers
“When a measure becomes a target, it ceases to be a good measure.” - Goodhart’s Law.
Whenever a model performs too well on some benchmark, the above quote starts doing its rounds on x.com. Well, the number of “published” papers or citations has been a metric to quantify a researcher’s impact for a long time. However, it’s becoming toxic in the current AI conference-driven society (in my opinion). On the one hand, I have noticed specific researchers regularly release papers on arXiv every couple of weeks or months. Well, some of these papers are so low in quality and experimental completeness that I can’t help but judge the researchers, which I usually don’t like doing since researchers can have a bad phase, but there’s a limit.
Not most papers, but many I read, are only based on “fair” ideas rather than innovative approaches. Don’t get me wrong, doing this for a couple of papers might be okay, but the pattern with more than a few papers is irritating. However, I believe that with the growing number of graduate students and concentrated domains, this is bound to happen.
For example, I have seen two simple recipes for publishing papers in the last year:
Benchmark! So many benchmarking papers have been released recently that they become obsolete within a few months. Many are low-quality or so small (like 100 samples for the VQA task) that it doesn’t make sense to use them to quantify the corresponding improvements. Maybe less than 1% (just a figure of speech) of these benchmarks are usually helpful, and others don’t even see the light of day.
Create Dataset, Train, Publish! Another pattern I have seen is papers that design a training dataset to improve performance on some tasks and train an existing model with some architecture modifications. Labs that can scale this effort to produce usable models contribute significantly, while others repeatedly prove these models’ scalability to different tasks with more data.
This is a good way to exploit the current opportunity presented by foundation models that are pre-trained for all tasks but need to be fine-tuned for specialized tasks. Doing this regularly while switching tasks and designing benchmarks or new training data seems silly. Pick a task and try to solve that! In all fairness, solving some tasks is also easy with more data, so it’s like a vicious cycle.
What’s the goal of my PhD?
People often ask me, “When can you graduate?” which loosely translates to “What achievements do you need to graduate?”
Apart from the coursework completion, 3 to 4 published papers are usually good enough to make a thesis. A couple of years ago, publishing these papers was quite an effort. Now? I am unsure; these foundational models have made it considerably easier to produce papers (no threshold on quality), which has led to more submissions to conferences and, in turn, more publications, so for AI PhDs, is this a good target metric for graduation or just a formality?
Is it Innovative Research or Standard Improvements?
Depending on the research domain, it’s becoming increasingly challenging to generate “creative” research directions in a timely fashion that is good enough to keep up with the community. So, how should I decide on my research project?
PhD’s are lousy at this [incremental and stable improvements], because this is precisely the opposite of what PhD programs are designed to train them for. PhDs are supposed to come up with innovative ideas…report the findings to the community by writing papers, and then move on. Once something becomes an actual product (or a product category), we need to stick with it and support it continuously. With a well-established system of processes, the necessity of PhD degrees disappears rapidly. - kyunghyuncho
As reflected in the above quote, I have noticed an increasing trend of productization of research in academia (think of the LLaVA series, CogVLM series, all these benchmarks). This is good since scalability is the primary driving force in the current landscape and will happen continuously. For every project release, one needs to develop a webpage, prepare a GitHub code release, a demo (for models), and a leaderboard (for benchmarks) at the least and sometimes even a teaser video to make the project attractive to the general audience.
Productization in academia means it’s becoming harder to produce papers with one lead author, which is the norm. For example, a single lead PhD student in my lab is responsible for completing the project from start (ideation) to finish (release), with co-authors mainly participating in discussions and feedback. This was the case during my recent internship at Microsoft Research as well. However, the workload involved in creating “impactful” models and benchmarks is too much for a single person nowadays: you need to curate the dataset, decide on the model architecture, evaluate on like 20-30 benchmarks, “cherry-pick” hundreds of examples, or, in some cases, propose a new one and doing all this quickly not to get scooped up by the 100s of researchers working on similar things.
I cope with this by working on interesting directions that I feel are not over-explored in the present literature yet innovative, like my recent work, or focusing on a very niche task, which leads to a manageable workload. Still, the resulting impact is at risk. Why? It’s hard to show that a method generalizes when there are hundreds of things to generalize. One might argue that’s not the goal of a PhD research project, but it seems critical in the current times!
The Social Dilemma
With the rise of influencers on social media who hype up AI products and project releases, the scientific quality of work is no longer the only driving force behind its popularity and impact. Sure, it’s still essential, but many people have incomplete knowledge about the background in the research domain. These influencers hype up papers from popular people and labs despite incomplete aspects or unanswered questions about the work. This is where the peer review system becomes so important. However, with the long time gap between the release of works on arXiv and conference publications, it’s impossible to wait and read the paper only when it’s published. This is particularly bad for PhDs unfamiliar with a domain and can have wrong opinions about what constitutes good work.
I try to counter this by talking to informed people (primarily other PhD students around me). Hyped-up works are often unimpressive in innovation if we exclude the ones with large-scale data engineering (it is also essential). To become more self-sufficient, I also try to look at papers about a myriad of topics.
It’s not the right Time for a PhD
some of them [PhDs] probably feel betrayed, as the gap between what they were promised earlier and what they see now is growing rapidly. some of them probably feel helpless, as their choice of research topics and their work on these topics seem less welcome at these companies. some of them probably feel defeated, as bachelor’s or master’s students seem to be better versed at training and deploying these large-scale models and look to be considered more valuable than they are. - kyunghyuncho
context: I plan to join the industry after my PhD
I realized this the hard way this year and last year while applying for industry research internships. Last year, I only got the offer from Microsoft after having published an MLLM paper, even though I had published a few papers by then, not around LLMs. I felt that during the couple of replies that I received from researchers who turned me down because of my limited experience in domains like LLMs, NLP, video, and image generation, even though they hadn’t worked in those domains but were lucky enough to have completed their PhDs and be present in the company. It happened again this year when someone (a computer vision expert) I wanted to work with told me, “You are certainly one of our strongest candidates, but I decided to hire somebody with a strong NLP/LLM background to do more exploration. Unfortunately this time I couldn't hire more than one intern.”
Apart from having a non-industry-friendly background, connections also matter a lot. I have seen PhDs with fewer papers/citations/experience than me find internships quickly due to their more compatible background and better connections (theirs or their advisors’). I see so many people hiring interns from the labs of people they know. I feel I have been more unlucky in this aspect than usual; some of us sometimes joke that I am cursed, but I hope not!
I even heard a faculty saying: “…he made the right choice not to do a PhD” at CVPR this year about a guy who went for a job instead, and I feel the same. Being a PhD student with a background (and multiple published works) in dense perception tasks and MLLMs seems insufficient for industry labs. I guess I will pivot to something else in the new year, maybe agent models, multimodal generation, pure LLM research, AI safety, or something else where I can do something significant.
I don’t think it’s wise to follow one’s interest completely during a PhD. If you have something to say that will change my opinion, please do so.
Being an international student on an F1 visa in the US is also a disadvantage since I don’t have complete freedom over what I do; I need to work on grant proposals and sponsored projects and cannot work at a company full-time along with my PhD and most companies do not want to hire part-time. It’s hard and may not be the best time to be a PhD student with no long-term industry collaboration unless your lab has a well-established, long-term, industry-friendly plan setup. The jury’s still out on this; I hope things get better for me.
The Social Network
Doing research is only a part of the PhD experience. The connections and collaborations made during a PhD play a massive role in one’s career. I like discussing ideas with others, and that’s the major social part of my workday life. I usually don’t hold myself back from giving people feedback about their research because I feel that’s a critical pre-submission peer review activity with the help of fellow students. So, don’t hesitate to get in touch with me if you ever have feedback about my research or want feedback on yours! The best way to improve as a PhD student is to work with other researchers actively.
That said, I love it when they take it constructively and are honest, but sometimes, I get the feeling that I’ve annoyed someone, which could be possible since sometimes I get too involved while discussing something. I am still trying to improve at detecting such situations, and I think I am getting better. If you have other tips on improving social life and learning as a PhD student, drop me a message on X!
Conclusion
The modern AI PhD experience is not all roses, and focusing on publishing papers or pursuing popular topics may not be optimal. Making the most of the journey involves various aspects, including being wise in your research direction and collaborating.
So, that’s it for now. I might have missed some points, and I’ll add those later whenever I recall. I hope this blog can help you somehow if you are in a similar boat. Please reach out if you have any suggestions for improving my PhD journey! I’ll get back to reading the papers now. I hope to be more regular in blogging in the new year, but I make no promises!