background
Erjia Yan, Ph.D.
I am an associate professor at Department of Information Science, College of Computing and Informatics (CCI) at Drexel University. My research interests lie in informetrics and scientometrics, scholarly data mining and analysis, and knowledge diffusion studies. My research helps to provide a vital empirical foundation for many facets of scientific activity, such as the propagation of innovations, the promotion of better and more transparent science policymaking, and the development of an equitable and sustainable scientific workforce.
Recent Projects:
Academic mobility at HBCUs |
|
 |
The project will collect faculty affiliation data from 35 HBCUs with master- or doctoral-level programs. It will use Internet Archive as the primary data source and LinkedIn, ORCID, and ProQuest as secondary data sources. The project will link large, heterogeneous corpora of faculty affiliation data, Carnegie Classification institution profile data, Web of Science publication and citation data, and survey and interview data. The linked data will be used to conduct expansive, cross-domain examinations of the impact of academic moves on individual professors’ research activity and institutional human capital. Funded by NSF award no. 2122525, 2121861, and 2122691. |
Academic mobility in the U.S. |
|
 |
This study uses two open science data sources—ORCID and Carnegie Classification of Institutions of Higher Education (CCIHE)—to identify tenure-track and tenured professors in the U.S. who have changed academic affiliations. This study finds that professors tended to move to institutions with higher research intensity such as those with a R1 or R2 designation in the Carnegie Classification. Additionally, this paper finds that female professors are more likely to move within the same geographic region than male professors and that when they move from a less research-intensive institution to a more research-intensive one, female professors are less likely to retain their rank or attain promotion. Read more |
Journal citation scores vs. citation sentiment |
|
 |
This study uses a large data set of PubMed Central (PMC) full-text publications and analyzes citation sentiment in more than 32 million citances within PMC, revealing citation sentiment patterns at the journal and discipline levels. It finds a weak relationship between a journal’s citation impact (as measured by CiteScore) and the average sentiment score of citances to its publications. When journals are aggregated into quartiles based on citation impact, journals in higher quartiles are cited more favorably than those in the lower quartiles. Further, social science journals are found to be cited with higher sentiment, followed by engineering and natural science and biomedical journals, respectively. This result may be attributed to disciplinary discourse patterns in which social science researchers tend to use more subjective terms to describe others’ work than do natural science or biomedical researchers. Read more |
Nobel papers citation sentiment change |
|
 |
This study measures the perception change as reflected in citation sentiment, with the attainment of a Nobel Prize in Chemistry or a Nobel Prize in Physiology or Medicine considered as the status change. The paper identifies 12,393 citances to 25 Nobel papers in PubMed Central and includes a control paper set of 75 papers with 30,851 citances. Results show a moderate increase in citation sentiment toward Nobel papers post-award. Dynamically, for Nobel papers, there is a steady sentiment increase, and a Nobel Prize seems to co-occur with this trend. This trend, however, is not evident in the control paper set. Read more |
NIH funding and associated publications |
|
 |
The conceptual connections between grants and publications are important, yet often overlooked in quantitative studies of science. AThis study aims to offer the first piece of evidence towards this endeavor by analyzing the ratio of keyword matchedness between accepted NIH research grants from 2008 to 2015 and their funded publications. We identified three identified predictors of the outcome: 1) the funding rate of an NIH research program in a specific year, 2) the year difference between grant and publication, and 3) the funding size of a grant. Read more |
Open access journal impact |
|
 |
Closed access journals have a noticeable advantage in social sciences, while open access journals perform well in medical and healthcare domains. After controlling for a journal’s rank and disciplinary differences, there are statistically more closed access journals in the top 10%, Quartile 1, and Quartile 2 categories as measured by CiteScore; in contrast, more open access journals in Quartile 4 gained scientific impact from 2011 to 2015. Considering dynamic and disciplinary trends in tandem, we find that more closed access journals in Social Sciences gained in impact, whereas in Biochemistry and Medicine, more open access journals experienced such gains. Read more |
R software mention and citation network analysis |
|
 |
We developed a software entity extraction method and identified 14,310 instances of R packages across the 13,684 PLoS journal papers mentioning or citing R. A paper-level co-mention network of these packages was visualized and analyzed. We found that the discipline and function of the packages can partly explain the largest clusters. The study offers the first large-scale analysis of R packages’ extensive use in scientific research. As such, it lays the foundation for future explorations of various roles played by software packages in the scientific enterprise. Read more |
Research funding vs. citation impact |
|
 |
Using a regression model with Heckman bias correction, we find that funding has a positive, significant association with a paper’s citations in STEMM fields. Further analyses show that this association is magnified by the factors of multiple authorship and multiple institutions. For funded papers in STEM, multi-author and multiinstitution papers tend to receive even more citations than single-authored and single-institution papers; however, funded papers in Medicine received less gain in citation impact when either factor is considered. Based on the finding that funding support has a stronger association with citation impact when it is treated as a binary variable than as a count variable, this study recommends the allocation of funding to researchers without active funding support, instead of giving awards to those with multiple funding supports at hand. Read more |
Data set mentions and citations |
|
 |
This study provides evidence of data set mentions and citations in multiple disciplines based on a content analysis of 600 publications in PLoS One. We find that data set mentions and citations varied greatly among disciplines in terms of how data sets were collected, referenced, and curated. While a majority of articles provided free access to data, formal ways of data attribution such as DOIs and data citations were used in a limited number of articles. In addition, data reuse took place in less than 30% of the publications that used data, suggesting that researchers are still inclined to create and use their own data sets, rather than reusing previously curated data. This study provides a comprehensive understanding of how data sets are used in science and helps institutions and publishers make useful data policies. Read more |
Word semantic change |
|
 |
We find that for the selected words in PubMed, overall, meanings are becoming more stable in the 2000s than they were in the 1980s and 1990s. At the topic level, the global distance of most topics is declining, suggesting that the words used to discuss these topics are stabilizing semantically. At the word level, this study identifies two different trends in word semantics, as measured by the aforementioned distance metrics: on the one hand, words can form clusters with their semantic neighbors, and these words, as a cluster, coevolve semantically; on the other hand, words can drift apart from their semantic neighbors while nonetheless stabilizing in the global context. In relating our work to language laws on semantic change, we find no overwhelming evidence to support either the law of parallel change or the law of conformity. Read more |
Domain-independent term extraction |
|
 |
This study developed an efficient, domain-independent term extraction method to extract disciplinary vocabularies from a large multidisciplinary corpus of PLoS ONE publications. It finds a power-law pattern in the frequency distributions of terms present in each discipline. The salient relationships amongst these vocabularies become apparent in application of a principal component analysis. For example, Mathematics and Computer and Information Sciences were found to have similar vocabulary use patterns along with Engineering and Physics; while Chemistry and the Social Sciences were found to exhibit contrasting vocabulary use patterns along with the Earth Sciences and Chemistry. Read more |
Faculty hiring network analysis |
|
 |
This study examines academic ranking and inequality in library and information science (LIS) using a faculty hiring network of 643 faculty members from 44 LIS schools in the United States. We study academic inequality using four distinct methods that include downward/upward placement, Lorenz curve, cliques, and egocentric networks of LIS schools and find that academic inequality exists in the LIS community. We show that the percentage of downward placement (68%) is much higher than that of upward placement (22%); meanwhile, 20% of the 30 LIS schools that have doctoral programs produced nearly 60% of all LIS faculty, with a Gini coefficient of 0.53. We also find cliques of highly ranked schools and a core/periphery structure that distinguishes LIS schools of different ranks. Read more |
Journal knowledge trading analysis |
|
 |
This study employs a set of trading based indicators to assess sources’ trading impact. These indicators are applied to several time-sliced source-tosource citation networks that comprise 33,634 sources indexed in the Scopus database. Results show that several interdisciplinary sources, such as Nature, PLOS ONE, Proceedings of the National Academy of Sciences, and Science, and several specialty sources, such as Lancet, Lecture Notes in Computer Science, Journal of the American Chemical Society, Journal of Biological Chemistry, and New England Journal of Medicine, have demonstrated their marked importance in knowledge trading. Furthermore, this study also reveals that, overall, sources have established more trading partners, increased their trading volumes, broadened their trading areas, and diversified their trading contents over the past 15 years from 1997 to 2011. Read more |
More research outputs can be found at Research.