This report presents a comprehensive analysis of Luxembourg’s research publication landscape using data extracted from OpenAlex, an open-access scholarly metadata platform. OpenAlex serves as a freely accessible alternative to proprietary academic databases, providing structured information about publications, authors, institutional affiliations, and research collaborations across all academic disciplines.
The data collection methodology focuses on identifying all scholarly works with Luxembourg institutional affiliations recorded in OpenAlex over the past decade. This approach captures both research led by Luxembourg-based scholars and international collaborative projects where Luxembourg institutions participate as co-authors. The temporal scope provides a current snapshot of the country’s scientific output and evolving research partnerships.
OpenAlex aggregates metadata from multiple sources including institutional repositories, publisher databases, and citation networks. While this multi-source approach enhances coverage comprehensiveness, data quality depends on the accuracy of source reporting and the platform’s ability to correctly identify and link Luxembourg-affiliated works. These methodological considerations should be kept in mind when interpreting the analytical findings presented throughout this report.
2 Data Structure and Document Types
The initial dataset encompasses all document types recorded in OpenAlex for Luxembourg-affiliated scholarly works during the study period. The following table displays the distribution of work types and the presence of Digital Object Identifiers (DOIs), which serve as persistent identifiers linking to original publications:
Table 1: Distribution of types of work and missingness of DOI
Based on the substantial predominance of journal articles in the dataset and their central importance in academic research communication, this analysis restricts its focus to articles exclusively. This selection encompasses both publications with DOI identifiers and those without, ensuring comprehensive coverage of Luxembourg’s peer-reviewed research output.
The analytical framework employs a critical distinction based on first authorship status: whether the primary author maintains affiliation with a Luxembourg institution or represents an international collaboration where Luxembourg institutions participate as secondary contributors. This classification enables differentiation between research leadership and research participation within the national research ecosystem:
Table 2: Number of articles where the first author is LU-affiliated
3 Research Domain Analysis
The following visualization examines the temporal distribution of Luxembourg-affiliated research across major scientific domains. The data utilizes OpenAlex’s domain classification system, which categorizes research fields into broad disciplinary areas. The analysis tracks publication volumes over time while maintaining the distinction between Luxembourg-led research (first author affiliation) and collaborative research (non-first author participation):
Code
import {Plot} from"@observablehq/plot"import {Inputs} from"@observablehq/inputs"import {rangeInput} from"@mootari/range-slider@1846"// Convert R data to JavaScript formatraw_data =transpose(primary_domain_lu_ojs)// Convert data types to ensure Observable Plot can use themdata = raw_data.map(d => ({publication_year:+d.publication_year,// Convert to numberprimary_domain_name: d.primary_domain_name,total:+d.total,// Convert to numberis_lu_first_author: d.is_lu_first_author==="TRUE"|| d.is_lu_first_author===true// Convert to boolean}))// Create color mappingdomain_colors =newMap([ ["Health Sciences","#FF6B35"], ["Life Sciences","#003399"], ["MISSING-DOMAIN","#228B22"], ["Physical Sciences","#FF1493"], ["Social Sciences","#800080"]])// Get unique values for controlsunique_domains = [...newSet(data.map(d => d.primary_domain_name))].sort()html`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Inputs:</h4> <label style="font-size: 16px; color: #333;">Select year range:</label></div>`
Plot.plot({width:800,height:500,marginLeft:60,marginBottom:60,x: {label:"Publication Year",domain: [Math.min(...filtered_data.map(d => d.publication_year)),Math.max(...filtered_data.map(d => d.publication_year))],tickFormat:"d",ticks: d3.range(Math.min(...filtered_data.map(d => d.publication_year)),Math.max(...filtered_data.map(d => d.publication_year)) +1 ) // Force integer ticks only },y: {label:"Number of Publications",grid:true },color: {domain: unique_domains,range: unique_domains.map(d => domain_colors.get(d)),legend:false },marks: [// Your existing marks here...// Solid lines for LU-affiliated authors Plot.line(filtered_data.filter(d => d.is_lu_first_author===true), {x:"publication_year",y:"total",stroke:"primary_domain_name",strokeWidth:2.5,z: d =>`${d.primary_domain_name}-LU` }),// Dashed lines for non-LU-affiliated authors Plot.line(filtered_data.filter(d => d.is_lu_first_author===false), {x:"publication_year",y:"total",stroke:"primary_domain_name",strokeWidth:2.5,strokeDasharray:"8,8",z: d =>`${d.primary_domain_name}-NonLU` }),// Points with different symbols Plot.dot(filtered_data, {x:"publication_year",y:"total",fill:"primary_domain_name",symbol: d => d.is_lu_first_author?"circle":"triangle",r:5,stroke:"white",strokeWidth:1 }),// Vertical line rule (only when "Both" is selected)...(selected_author_type ==="Both"? [ Plot.ruleX(filtered_data, Plot.pointerX({x:"publication_year",stroke:"gray",strokeWidth:1,strokeDasharray:"3,3",opacity:0.7 })) ] : []),// Enhanced tooltipsPlot.tip(filtered_data, selected_author_type ==="Both"? Plot.pointerX({x:"publication_year",title: (d, i, data) => {// Group all data points by year for this x-positionconst year = d.publication_year;const allPointsAtYear = data.filter(point => point.publication_year=== year);// Create title showing all domains and their values at this yearconst yearTitle =`Year: ${year}\n${'-'.repeat(20)}\n`;const entries = allPointsAtYear.map(point =>`${point.primary_domain_name}: ${point.total} (${point.is_lu_first_author?'LU':'Non-LU'})` ).join('\n');return yearTitle + entries; },fontSize:12 }) : Plot.pointer({x:"publication_year",y:"total",fill:"primary_domain_name",title: d =>`${d.primary_domain_name}\n${d.publication_year}\nPublications: ${d.total}\nLU First Author: ${d.is_lu_first_author?"Yes":"No"}` })) ],title:"Luxembourg Research Publications by Domain Over Time"})
OpenAlex employs a more granular classification system through subfields, which provides greater specificity than broad domains. Given that this taxonomy encompasses over 200 distinct subfields, this analysis focuses on the ten most frequently represented subfields in Luxembourg’s research output:
Code
raw_data_subfield =transpose(primary_subfield_lu_ojs)// Convert data types to ensure Observable Plot can use themdata_subfield = raw_data_subfield.map(d => ({publication_year:+d.publication_year,// Convert to numberprimary_subfield_name: d.primary_subfield_name,total:+d.total,// Convert to numberis_lu_first_author: d.is_lu_first_author==="TRUE"|| d.is_lu_first_author===true// Convert to boolean}))// Create color mapping for your actual subfieldssubfield_colors =newMap([ ["Aerospace Engineering","#FF6B35"],// Orange-red ["Artificial Intelligence","#003399"],// Deep blue ["Computer Networks and Communications","#228B22"],// Forest green ["Computer Vision and Pattern Recognition","#FF1493"],// Deep pink ["Economics and Econometrics","#800080"],// Purple ["Electrical and Electronic Engineering","#FFD700"],// Gold ["Information Systems","#DE2910"],// Red ["Materials Chemistry","#C8102E"],// Crimson ["Molecular Biology","#009246"],// Green ["Neurology","#AA151B"],// Dark red ["Political Science and International Relations","#FF7F00"],// Orange ["Pulmonary and Respiratory Medicine","#4B0082"],// Indigo ["Sociology and Political Science","#8B4513"] // Saddle brown])// Get unique values for controlsunique_subfields = [...newSet(data_subfield.map(d => d.primary_subfield_name))].sort()html`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Inputs:</h4> <label style="font-size: 16px; color: #333;">Select year range:</label></div>`
Plot.plot({width:800,height:500,marginLeft:60,marginBottom:60,x: {label:"Publication Year",domain: [Math.min(...filtered_data_subfield.map(d => d.publication_year)),Math.max(...filtered_data_subfield.map(d => d.publication_year))],tickFormat:"d",ticks: d3.range(Math.min(...filtered_data_subfield.map(d => d.publication_year)),Math.max(...filtered_data_subfield.map(d => d.publication_year)) +1 ) },y: {label:"Number of Publications",grid:true },color: {domain: unique_subfields,range: unique_subfields.map(d => subfield_colors.get(d) ||'#888'),legend:false },marks: [// Solid lines for LU-affiliated authors Plot.line(filtered_data_subfield.filter(d => d.is_lu_first_author===true), {x:"publication_year",y:"total",stroke:"primary_subfield_name",strokeWidth:2.5,z: d =>`${d.primary_subfield_name}-LU` }),// Dashed lines for non-LU-affiliated authors Plot.line(filtered_data_subfield.filter(d => d.is_lu_first_author===false), {x:"publication_year",y:"total",stroke:"primary_subfield_name",strokeWidth:2.5,strokeDasharray:"8,8",z: d =>`${d.primary_subfield_name}-NonLU` }),// Points with different symbols Plot.dot(filtered_data_subfield, {x:"publication_year",y:"total",fill:"primary_subfield_name",symbol: d => d.is_lu_first_author?"circle":"triangle",r:5,stroke:"white",strokeWidth:1 }),// Vertical line rule (only when "Both" is selected)...(selected_author_type_subfield ==="Both"? [ Plot.ruleX(filtered_data_subfield, Plot.pointerX({x:"publication_year",stroke:"gray",strokeWidth:1,strokeDasharray:"3,3",opacity:0.7 })) ] : []),// Enhanced tooltips Plot.tip(filtered_data_subfield, selected_author_type_subfield ==="Both"? Plot.pointerX({x:"publication_year",title: (d, i, data) => {// Group all data points by year for this x-positionconst year = d.publication_year;const allPointsAtYear = data.filter(point => point.publication_year=== year);// Create title showing all subfields and their values at this yearconst yearTitle =`Year: ${year}\n${'-'.repeat(20)}\n`;const entries = allPointsAtYear.map(point =>`${point.primary_subfield_name}: ${point.total} (${point.is_lu_first_author?'LU':'Non-LU'})` ).join('\n');return yearTitle + entries; },fontSize:12 }) : Plot.pointer({x:"publication_year",y:"total",fill:"primary_subfield_name",title: d =>`${d.primary_subfield_name}\n${d.publication_year}\nPublications: ${d.total}\nLU First Author: ${d.is_lu_first_author?"Yes":"No"}` }) ) ],title:"Luxembourg Research Publications by Subfield Over Time"})
4 International Collaboration Patterns
The analysis extends to examining Luxembourg’s research collaboration patterns with international partners. The dataset contains information about co-author affiliations, enabling identification of the most frequent collaborating countries and regions. This section presents the geographical distribution of research partnerships, organized by publication year and distinguished by Luxembourg’s role as lead author versus collaborative partner.
The data processing groups countries into meaningful categories, including major individual nations, regional blocs, and an aggregated “Others” category for countries with lower collaboration frequencies. This approach provides clarity while maintaining analytical depth regarding Luxembourg’s primary research partnerships:
Code
raw_country_data =transpose(country_authors_unique_ojs)// Convert data types to ensure Observable Plot can use themcountry_data = raw_country_data.map(d => ({publication_year:+d.publication_year,is_lu_first_author: d.is_lu_first_author==="TRUE"|| d.is_lu_first_author===true,country_groups: d.country_groups,n:+d.n}))// Create color mapping for countriescountry_colors =newMap([ ["European Union","#003399"], ["Others","#FF1493"], ["Luxembourg","#FF6B35"], ["France","#800080"], ["USA","#228B22"], ["Belgium","#FFD700"], ["China","#DE2910"], ["Great Britain","#C8102E"], ["Italy","#009246"], ["Spain","#AA151B"], ["Switzerland","#FF0000"], ["Netherlands","#FF7F00"]])// Get unique values for controlsunique_years = [...newSet(country_data.map(d => d.publication_year))].sort()unique_countries = [...newSet(country_data.map(d => d.country_groups))].sort()html`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Inputs:</h4> <label style="font-size: 16px; color: #333;">Select year range:</label></div>`
Plot.plot({width:900,height:600,marginLeft:80,marginBottom:80,marginRight:40,x: {label:null,axis:null,paddingOuter:0.2 },y: {label:"Total Publications",grid:true },fx: {label:"Publication Year",tickFormat:"d" },fy: {label:null,tickFormat: d => d ?"LU First Author":"LU Not First Author" },color: {domain: unique_countries,range: unique_countries.map(d => country_colors.get(d)),legend:false },facet: {data: filtered_country_data,x:"publication_year",y:"is_lu_first_author",marginTop:40 },marks: [ Plot.barY(filtered_country_data, Plot.groupX( {y:"sum"}, {x:"country_groups",y:"n",fill:"country_groups",stroke:"black",strokeWidth:0.5,tip:true,title: d =>`${d.country_groups}\n${d.publication_year}\nPublications: ${d.n}\nLU First Author: ${d.is_lu_first_author?"Yes":"No"}` } )), Plot.frame({stroke:"black",strokeWidth:1}) ],style: {fontSize:"12px" },title:"Publications by Country/Region Over Time (Faceted by Luxembourg First Author Status)"})
The table belows shows some citation statistics by year and domain:
Table 3: Citation statistics
As expected, citation patterns, regardless of discipline, publication year, or author affiliation, tend to follow a power-law distribution. Most research articles receive relatively few citations, as shown by the low median. In the health sciences, for LU-affiliated lead authors, only 25% of articles published in 2024 received at least 2 citations (as shown in the 75th percentile column in the table above), and just 1% reached 16 citations. However, citation counts accumulate over time, so it is more informative to consider older publications. For the same group, among articles published in 2019, 25% received at least 18 citations, and 1% reached 147 or more. The highest citation count observed for a single article in this group was 482.
Luxembourg has three official languages but English is the language of science, regardless of domain:
Source Code
---title: "Research Publications Affiliated with Luxembourg: A Data-Driven Analysis"---```{r}#| include: falselibrary(DT)library(dplyr)library(ggplot2)library(rixpress)library(tidyr)# Load datarxp_load("type_doi_missing")rxp_load("lu_first_authors")rxp_load("primary_domain_lu")rxp_load("primary_subfield_lu")rxp_load("languages")rxp_load("citation_data_stats")rxp_load("country_authors_unique")```## IntroductionThis report presents a comprehensive analysis of Luxembourg's researchpublication landscape using data extracted from OpenAlex, an open-accessscholarly metadata platform. OpenAlex serves as a freely accessible alternativeto proprietary academic databases, providing structured information aboutpublications, authors, institutional affiliations, and research collaborationsacross all academic disciplines.The data collection methodology focuses on identifying all scholarly works withLuxembourg institutional affiliations recorded in OpenAlex over the past decade.This approach captures both research led by Luxembourg-based scholars andinternational collaborative projects where Luxembourg institutions participateas co-authors. The temporal scope provides a current snapshot of the country'sscientific output and evolving research partnerships.OpenAlex aggregates metadata from multiple sources including institutionalrepositories, publisher databases, and citation networks. While thismulti-source approach enhances coverage comprehensiveness, data quality dependson the accuracy of source reporting and the platform's ability to correctlyidentify and link Luxembourg-affiliated works. These methodologicalconsiderations should be kept in mind when interpreting the analytical findingspresented throughout this report.## Data Structure and Document TypesThe initial dataset encompasses all document types recorded in OpenAlex forLuxembourg-affiliated scholarly works during the study period. The followingtable displays the distribution of work types and the presence of Digital ObjectIdentifiers (DOIs), which serve as persistent identifiers linking to originalpublications:```{r}#| echo: false#| label: tbl-type-doi-missing#| tbl-cap: Distribution of types of work and missingness of DOIdatatable( type_doi_missing,caption ="Distribution of types of work and missingness of DOI",filter =list(position ='top', clear =FALSE),options =list(pageLength =15,scrollX =TRUE,dom ='Bfrtip',buttons =c('excel') ),extensions ='Buttons')```Based on the substantial predominance of journal articles in the dataset andtheir central importance in academic research communication, this analysisrestricts its focus to articles exclusively. This selection encompasses bothpublications with DOI identifiers and those without, ensuring comprehensivecoverage of Luxembourg's peer-reviewed research output.The analytical framework employs a critical distinction based on firstauthorship status: whether the primary author maintains affiliation with aLuxembourg institution or represents an international collaboration whereLuxembourg institutions participate as secondary contributors. Thisclassification enables differentiation between research leadership and researchparticipation within the national research ecosystem:```{r}#| echo: false#| label: tbl-lu-first-authors#| tbl-cap: Number of articles where the first author is LU-affiliatedlu_first_authors <- lu_first_authors %>%pivot_wider(names_from = is_lu_first_author, values_from = total) %>%rename(`Publication Year`= publication_year,`Non-LU first author`=`FALSE`,`LU first author`=`TRUE` ) %>%mutate(`Publication Year`=as.integer(`Publication Year`))datatable( lu_first_authors,filter =list(position ='top', clear =FALSE),caption ="Number of articles where the first author is LU-affiliated",options =list(pageLength =10,scrollX =TRUE,dom ='Bfrtip',buttons =c('excel'),order =list(list(0, 'desc')) ),extensions ='Buttons') %>%formatStyle(columns =colnames(.), fontsize ='12px')```## Research Domain AnalysisThe following visualization examines the temporal distribution ofLuxembourg-affiliated research across major scientific domains. The datautilizes OpenAlex's domain classification system, which categorizes researchfields into broad disciplinary areas. The analysis tracks publication volumesover time while maintaining the distinction between Luxembourg-led research(first author affiliation) and collaborative research (non-first authorparticipation):```{r}#| include: false#| fig-height: 6#| fig-width: 12domain_colors <-c(# Add your actual domain names here with distinctive colors"Health Sciences"="#FF6B35", # Orange-red"Life Sciences"="#003399", # Deep blue"MISSING-DOMAIN"="#228B22", # Forest green"Physical Sciences"="#FF1493", # Deep pink"Social Sciences"="#800080"# Purple)primary_domain_lu %>%mutate(is_lu_first_author =ifelse(is_lu_first_author, "LU-affiliated first author", "Non LU-affiliated first author")) %>%ggplot(aes(x = publication_year,y = total, color = primary_domain_name,group = primary_domain_name, )) +geom_line(linewidth =1.2, alpha =0.8) +geom_point(size =2, alpha =0.9) +scale_color_manual(values = domain_colors) +facet_wrap(~is_lu_first_author,labeller =labeller(is_lu_first_author =c("LU-affiliated first author"="LU-affiliated first author","Non LU-affiliated first author"="Non LU-affiliated first author" ) ) ) +labs(title ="Luxembourg Research Publications by Domain Over Time",subtitle ="Faceted by Luxembourg First Author Status",x ="Publication Year",y ="Number of Publications",color ="Research Domain" ) +theme_minimal() +theme(axis.text.x =element_text(size =10),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =12),legend.position ="bottom",legend.title =element_text(face ="bold"),strip.text =element_text(face ="bold", size =11),strip.background =element_rect(fill ="lightgray", color ="black"),axis.title =element_text(size =12),panel.grid.major =element_line(color ="lightgray", linewidth =0.3),panel.grid.minor =element_blank() ) +# Adjust legend to show in multiple rows if neededguides(color =guide_legend(nrow =2, byrow =TRUE))``````{r}#| echo: falseojs_define(primary_domain_lu_ojs = primary_domain_lu)``````{ojs}//| echo: false// Import required librariesimport {Plot} from"@observablehq/plot"import {Inputs} from"@observablehq/inputs"import {rangeInput} from"@mootari/range-slider@1846"// Convert R data to JavaScript formatraw_data =transpose(primary_domain_lu_ojs)// Convert data types to ensure Observable Plot can use themdata = raw_data.map(d => ({publication_year:+d.publication_year,// Convert to numberprimary_domain_name: d.primary_domain_name,total:+d.total,// Convert to numberis_lu_first_author: d.is_lu_first_author==="TRUE"|| d.is_lu_first_author===true// Convert to boolean}))// Create color mappingdomain_colors =newMap([ ["Health Sciences","#FF6B35"], ["Life Sciences","#003399"], ["MISSING-DOMAIN","#228B22"], ["Physical Sciences","#FF1493"], ["Social Sciences","#800080"]])// Get unique values for controlsunique_domains = [...newSet(data.map(d => d.primary_domain_name))].sort()html`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Inputs:</h4> <label style="font-size: 16px; color: #333;">Select year range:</label></div>`viewof year_range =rangeInput({min:2015,max:2025,value: [2015,2025],step:1})viewof selected_domains = Inputs.checkbox( unique_domains, {value: ["Health Sciences"],label:"Research Domains" })viewof selected_author_type = Inputs.radio( ["LU-affiliated first author","Non LU-affiliated first author","Both"], {value:"Both",label:"First Author Affiliation" })// Filter data based on selectionsfiltered_data = data.filter(d => {// Year filterconst yearOk = d.publication_year>= year_range[0] && d.publication_year<= year_range[1];// Domain filterconst domainOk = selected_domains.includes(d.primary_domain_name);// Author filterlet authorOk =false;if (selected_author_type ==="Both") { authorOk =true; } elseif (selected_author_type ==="LU-affiliated first author") { authorOk = d.is_lu_first_author===true; } elseif (selected_author_type ==="Non LU-affiliated first author") { authorOk = d.is_lu_first_author===false; }return yearOk && domainOk && authorOk;})// Combined HTML legend with domain colors and author typeshtml`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Legend:</h4> <!-- Domain Colors --> <div style="margin-bottom: 15px;"> <h5 style="margin: 0 0 10px 0;">Research Domains:</h5> <div style="display: flex; flex-wrap: wrap; gap: 15px;">${unique_domains.map(domain =>` <div style="display: flex; align-items: center; gap: 6px;"> <div style="width: 16px; height: 16px; background-color: ${domain_colors.get(domain)}; border-radius: 2px;"></div> <span style="font-size: 14px;">${domain}</span> </div> `).join('')} </div> </div> <!-- Author Types --> <div> <h5 style="margin: 0 0 10px 0;">Author Types:</h5> <div style="display: flex; gap: 30px; align-items: center; flex-wrap: wrap;"> <div style="display: flex; align-items: center; gap: 8px;"> <svg width="30" height="20"> <line x1="0" y1="10" x2="30" y2="10" stroke="#333" stroke-width="2.5" /> <circle cx="15" cy="10" r="4" fill="#333" stroke="white" stroke-width="1"/> </svg> <span><strong>LU-affiliated first author</strong> (solid line + circle)</span> </div> <div style="display: flex; align-items: center; gap: 8px;"> <svg width="30" height="20"> <line x1="0" y1="10" x2="30" y2="10" stroke="#333" stroke-width="2.5" stroke-dasharray="8,8"/> <polygon points="15,6 19,14 11,14" fill="#333" stroke="white" stroke-width="1"/> </svg> <span><strong>Non-LU-affiliated first author</strong> (dashed line + triangle)</span> </div> </div> </div></div>`// Create the plot with dynamic x-axis based on filtered data//| echo: false// Create the plot with integer-only year ticksPlot.plot({width:800,height:500,marginLeft:60,marginBottom:60,x: {label:"Publication Year",domain: [Math.min(...filtered_data.map(d => d.publication_year)),Math.max(...filtered_data.map(d => d.publication_year))],tickFormat:"d",ticks: d3.range(Math.min(...filtered_data.map(d => d.publication_year)),Math.max(...filtered_data.map(d => d.publication_year)) +1 ) // Force integer ticks only },y: {label:"Number of Publications",grid:true },color: {domain: unique_domains,range: unique_domains.map(d => domain_colors.get(d)),legend:false },marks: [// Your existing marks here...// Solid lines for LU-affiliated authors Plot.line(filtered_data.filter(d => d.is_lu_first_author===true), {x:"publication_year",y:"total",stroke:"primary_domain_name",strokeWidth:2.5,z: d =>`${d.primary_domain_name}-LU` }),// Dashed lines for non-LU-affiliated authors Plot.line(filtered_data.filter(d => d.is_lu_first_author===false), {x:"publication_year",y:"total",stroke:"primary_domain_name",strokeWidth:2.5,strokeDasharray:"8,8",z: d =>`${d.primary_domain_name}-NonLU` }),// Points with different symbols Plot.dot(filtered_data, {x:"publication_year",y:"total",fill:"primary_domain_name",symbol: d => d.is_lu_first_author?"circle":"triangle",r:5,stroke:"white",strokeWidth:1 }),// Vertical line rule (only when "Both" is selected)...(selected_author_type ==="Both"? [ Plot.ruleX(filtered_data, Plot.pointerX({x:"publication_year",stroke:"gray",strokeWidth:1,strokeDasharray:"3,3",opacity:0.7 })) ] : []),// Enhanced tooltipsPlot.tip(filtered_data, selected_author_type ==="Both"? Plot.pointerX({x:"publication_year",title: (d, i, data) => {// Group all data points by year for this x-positionconst year = d.publication_year;const allPointsAtYear = data.filter(point => point.publication_year=== year);// Create title showing all domains and their values at this yearconst yearTitle =`Year: ${year}\n${'-'.repeat(20)}\n`;const entries = allPointsAtYear.map(point =>`${point.primary_domain_name}: ${point.total} (${point.is_lu_first_author?'LU':'Non-LU'})` ).join('\n');return yearTitle + entries; },fontSize:12 }) : Plot.pointer({x:"publication_year",y:"total",fill:"primary_domain_name",title: d =>`${d.primary_domain_name}\n${d.publication_year}\nPublications: ${d.total}\nLU First Author: ${d.is_lu_first_author?"Yes":"No"}` })) ],title:"Luxembourg Research Publications by Domain Over Time"})```OpenAlex employs a more granular classification system through subfields, whichprovides greater specificity than broad domains. Given that this taxonomyencompasses over 200 distinct subfields, this analysis focuses on the ten mostfrequently represented subfields in Luxembourg's research output:```{r}#| echo: falseojs_define(primary_subfield_lu_ojs = primary_subfield_lu)``````{ojs}//| echo: false// Convert R data to JavaScript formatraw_data_subfield =transpose(primary_subfield_lu_ojs)// Convert data types to ensure Observable Plot can use themdata_subfield = raw_data_subfield.map(d => ({publication_year:+d.publication_year,// Convert to numberprimary_subfield_name: d.primary_subfield_name,total:+d.total,// Convert to numberis_lu_first_author: d.is_lu_first_author==="TRUE"|| d.is_lu_first_author===true// Convert to boolean}))// Create color mapping for your actual subfieldssubfield_colors =newMap([ ["Aerospace Engineering","#FF6B35"],// Orange-red ["Artificial Intelligence","#003399"],// Deep blue ["Computer Networks and Communications","#228B22"],// Forest green ["Computer Vision and Pattern Recognition","#FF1493"],// Deep pink ["Economics and Econometrics","#800080"],// Purple ["Electrical and Electronic Engineering","#FFD700"],// Gold ["Information Systems","#DE2910"],// Red ["Materials Chemistry","#C8102E"],// Crimson ["Molecular Biology","#009246"],// Green ["Neurology","#AA151B"],// Dark red ["Political Science and International Relations","#FF7F00"],// Orange ["Pulmonary and Respiratory Medicine","#4B0082"],// Indigo ["Sociology and Political Science","#8B4513"] // Saddle brown])// Get unique values for controlsunique_subfields = [...newSet(data_subfield.map(d => d.primary_subfield_name))].sort()html`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Inputs:</h4> <label style="font-size: 16px; color: #333;">Select year range:</label></div>`viewof year_range_subfield =rangeInput({min:2015,max:2025,value: [2015,2025],step:1})viewof selected_subfields = Inputs.checkbox( unique_subfields, {value: ["Artificial Intelligence","Computer Vision and Pattern Recognition","Neurology"],label:"Research Subfields" })viewof selected_author_type_subfield = Inputs.radio( ["LU-affiliated first author","Non LU-affiliated first author","Both"], {value:"Both",label:"First Author Affiliation" })// Filter data based on selectionsfiltered_data_subfield = data_subfield.filter(d => {// Year filterconst yearOk = d.publication_year>= year_range_subfield[0] && d.publication_year<= year_range_subfield[1];// Subfield filterconst subfieldOk = selected_subfields.includes(d.primary_subfield_name);// Author filterlet authorOk =false;if (selected_author_type_subfield ==="Both") { authorOk =true; } elseif (selected_author_type_subfield ==="LU-affiliated first author") { authorOk = d.is_lu_first_author===true; } elseif (selected_author_type_subfield ==="Non LU-affiliated first author") { authorOk = d.is_lu_first_author===false; }return yearOk && subfieldOk && authorOk;})// Combined HTML legend with subfield colors and author typeshtml`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Legend:</h4> <!-- Subfield Colors --> <div style="margin-bottom: 15px;"> <h5 style="margin: 0 0 10px 0;">Research Subfields:</h5> <div style="display: flex; flex-wrap: wrap; gap: 15px;">${unique_subfields.map(subfield =>` <div style="display: flex; align-items: center; gap: 6px;"> <div style="width: 16px; height: 16px; background-color: ${subfield_colors.get(subfield) ||'#888'}; border-radius: 2px;"></div> <span style="font-size: 14px;">${subfield}</span> </div> `).join('')} </div> </div> <!-- Author Types --> <div> <h5 style="margin: 0 0 10px 0;">Author Types:</h5> <div style="display: flex; gap: 30px; align-items: center; flex-wrap: wrap;"> <div style="display: flex; align-items: center; gap: 8px;"> <svg width="30" height="20"> <line x1="0" y1="10" x2="30" y2="10" stroke="#333" stroke-width="2.5" /> <circle cx="15" cy="10" r="4" fill="#333" stroke="white" stroke-width="1"/> </svg> <span><strong>LU-affiliated first author</strong> (solid line + circle)</span> </div> <div style="display: flex; align-items: center; gap: 8px;"> <svg width="30" height="20"> <line x1="0" y1="10" x2="30" y2="10" stroke="#333" stroke-width="2.5" stroke-dasharray="8,8"/> <polygon points="15,6 19,14 11,14" fill="#333" stroke="white" stroke-width="1"/> </svg> <span><strong>Non-LU-affiliated first author</strong> (dashed line + triangle)</span> </div> </div> </div></div>`// Create the plot with integer-only year ticksPlot.plot({width:800,height:500,marginLeft:60,marginBottom:60,x: {label:"Publication Year",domain: [Math.min(...filtered_data_subfield.map(d => d.publication_year)),Math.max(...filtered_data_subfield.map(d => d.publication_year))],tickFormat:"d",ticks: d3.range(Math.min(...filtered_data_subfield.map(d => d.publication_year)),Math.max(...filtered_data_subfield.map(d => d.publication_year)) +1 ) },y: {label:"Number of Publications",grid:true },color: {domain: unique_subfields,range: unique_subfields.map(d => subfield_colors.get(d) ||'#888'),legend:false },marks: [// Solid lines for LU-affiliated authors Plot.line(filtered_data_subfield.filter(d => d.is_lu_first_author===true), {x:"publication_year",y:"total",stroke:"primary_subfield_name",strokeWidth:2.5,z: d =>`${d.primary_subfield_name}-LU` }),// Dashed lines for non-LU-affiliated authors Plot.line(filtered_data_subfield.filter(d => d.is_lu_first_author===false), {x:"publication_year",y:"total",stroke:"primary_subfield_name",strokeWidth:2.5,strokeDasharray:"8,8",z: d =>`${d.primary_subfield_name}-NonLU` }),// Points with different symbols Plot.dot(filtered_data_subfield, {x:"publication_year",y:"total",fill:"primary_subfield_name",symbol: d => d.is_lu_first_author?"circle":"triangle",r:5,stroke:"white",strokeWidth:1 }),// Vertical line rule (only when "Both" is selected)...(selected_author_type_subfield ==="Both"? [ Plot.ruleX(filtered_data_subfield, Plot.pointerX({x:"publication_year",stroke:"gray",strokeWidth:1,strokeDasharray:"3,3",opacity:0.7 })) ] : []),// Enhanced tooltips Plot.tip(filtered_data_subfield, selected_author_type_subfield ==="Both"? Plot.pointerX({x:"publication_year",title: (d, i, data) => {// Group all data points by year for this x-positionconst year = d.publication_year;const allPointsAtYear = data.filter(point => point.publication_year=== year);// Create title showing all subfields and their values at this yearconst yearTitle =`Year: ${year}\n${'-'.repeat(20)}\n`;const entries = allPointsAtYear.map(point =>`${point.primary_subfield_name}: ${point.total} (${point.is_lu_first_author?'LU':'Non-LU'})` ).join('\n');return yearTitle + entries; },fontSize:12 }) : Plot.pointer({x:"publication_year",y:"total",fill:"primary_subfield_name",title: d =>`${d.primary_subfield_name}\n${d.publication_year}\nPublications: ${d.total}\nLU First Author: ${d.is_lu_first_author?"Yes":"No"}` }) ) ],title:"Luxembourg Research Publications by Subfield Over Time"})```## International Collaboration PatternsThe analysis extends to examining Luxembourg's research collaboration patternswith international partners. The dataset contains information about co-authoraffiliations, enabling identification of the most frequent collaboratingcountries and regions. This section presents the geographical distribution ofresearch partnerships, organized by publication year and distinguished byLuxembourg's role as lead author versus collaborative partner.The data processing groups countries into meaningful categories, including majorindividual nations, regional blocs, and an aggregated "Others" category forcountries with lower collaboration frequencies. This approach provides claritywhile maintaining analytical depth regarding Luxembourg's primary researchpartnerships:```{r}#| include: false#| warning: false#| fig-height: 14#| fig-width: 12country_colors <-c("European Union"="#003399", # Deep blue (EU flag)"Others"="#FF1493", # Deep pink"Luxembourg"="#FF6B35", # Orange-red"France"="#800080", # Purple"USA"="#228B22", # Forest green"Belgium"="#FFD700", # Gold"China"="#DE2910", # Red"Great Britain"="#C8102E", # British red (Union Jack)"Italy"="#009246", # Italian green (flag)"Spain"="#AA151B", # Spanish red (flag)"Switzerland"="#FF0000", # Swiss red (flag)"Netherlands"="#FF7F00"# Dutch orange (national color))# Plot for publication with co-authorsggplot(country_authors_unique, aes(x = publication_year, y = n, fill = country_groups)) +geom_col(position ="dodge", color ="black", size =0.5) +# Dodge bars for multiple countries per yearscale_fill_manual(values = country_colors) +facet_wrap(~is_lu_first_author,ncol =1,labeller =labeller(is_lu_first_author =c("FALSE"="LU Not First Author","TRUE"="LU First Author" ) ) ) +labs(title ="Publications by Country/Region Over Time",subtitle ="Faceted by Luxembourg First Author Status",x ="Publication Year",y ="Total Publications",fill ="Country/Region" ) +theme_minimal() +theme(axis.text.x =element_text(size =10),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =12),legend.position ="bottom",legend.title =element_text(face ="bold"),strip.text =element_text(face ="bold", size =11),strip.background =element_rect(fill ="lightgray", color ="black") ) +# Adjust legend to show in multiple rows if neededguides(fill =guide_legend(nrow =2, byrow =TRUE))``````{r}#| echo: falseojs_define(country_authors_unique_ojs = country_authors_unique)``````{ojs}//| echo: false// Convert R data to JavaScript formatraw_country_data =transpose(country_authors_unique_ojs)// Convert data types to ensure Observable Plot can use themcountry_data = raw_country_data.map(d => ({publication_year:+d.publication_year,is_lu_first_author: d.is_lu_first_author==="TRUE"|| d.is_lu_first_author===true,country_groups: d.country_groups,n:+d.n}))// Create color mapping for countriescountry_colors =newMap([ ["European Union","#003399"], ["Others","#FF1493"], ["Luxembourg","#FF6B35"], ["France","#800080"], ["USA","#228B22"], ["Belgium","#FFD700"], ["China","#DE2910"], ["Great Britain","#C8102E"], ["Italy","#009246"], ["Spain","#AA151B"], ["Switzerland","#FF0000"], ["Netherlands","#FF7F00"]])// Get unique values for controlsunique_years = [...newSet(country_data.map(d => d.publication_year))].sort()unique_countries = [...newSet(country_data.map(d => d.country_groups))].sort()html`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Inputs:</h4> <label style="font-size: 16px; color: #333;">Select year range:</label></div>`viewof country_year_range =rangeInput({min:2015,max:2025,value: [2022,2025],step:1})viewof selected_countries = Inputs.checkbox( unique_countries, {value: unique_countries,label:"Countries/Regions" })viewof country_author_type = Inputs.radio( ["LU-affiliated first author","Non LU-affiliated first author","Both"], {value:"Both",label:"First Author Affiliation" })// Filter data based on selectionsfiltered_country_data = country_data.filter(d => {// Year filterconst yearOk = d.publication_year>= country_year_range[0] && d.publication_year<= country_year_range[1];// Country filterconst countryOk = selected_countries.includes(d.country_groups);// Author filterlet authorOk =false;if (country_author_type ==="Both") { authorOk =true; } elseif (country_author_type ==="LU-affiliated first author") { authorOk = d.is_lu_first_author===true; } elseif (country_author_type ==="Non LU-affiliated first author") { authorOk = d.is_lu_first_author===false; }return yearOk && countryOk && authorOk;})// Get years present in filtered data for x-axisyears_present = [...newSet(filtered_country_data.map(d => d.publication_year))].sort()// Legend for countrieshtml`<div style="margin-top: 20px; padding: 15px;"> <h4 style="margin-top: 0; margin-bottom: 15px;">Legend:</h4> <div style="margin-bottom: 15px;"> <h5 style="margin: 0 0 10px 0;">Countries/Regions:</h5> <div style="display: flex; flex-wrap: wrap; gap: 15px;">${unique_countries.map(country =>` <div style="display: flex; align-items: center; gap: 6px;"> <div style="width: 16px; height: 16px; background-color: ${country_colors.get(country)}; border: 1px solid black; border-radius: 2px;"></div> <span style="font-size: 14px;">${country}</span> </div> `).join('')} </div> </div></div>`// Create faceted grouped bar chart - using fx for grouping and fy for LU statusPlot.plot({width:900,height:600,marginLeft:80,marginBottom:80,marginRight:40,x: {label:null,axis:null,paddingOuter:0.2 },y: {label:"Total Publications",grid:true },fx: {label:"Publication Year",tickFormat:"d" },fy: {label:null,tickFormat: d => d ?"LU First Author":"LU Not First Author" },color: {domain: unique_countries,range: unique_countries.map(d => country_colors.get(d)),legend:false },facet: {data: filtered_country_data,x:"publication_year",y:"is_lu_first_author",marginTop:40 },marks: [ Plot.barY(filtered_country_data, Plot.groupX( {y:"sum"}, {x:"country_groups",y:"n",fill:"country_groups",stroke:"black",strokeWidth:0.5,tip:true,title: d =>`${d.country_groups}\n${d.publication_year}\nPublications: ${d.n}\nLU First Author: ${d.is_lu_first_author?"Yes":"No"}` } )), Plot.frame({stroke:"black",strokeWidth:1}) ],style: {fontSize:"12px" },title:"Publications by Country/Region Over Time (Faceted by Luxembourg First Author Status)"})```The table belows shows some citation statistics by year and domain:```{r}#| echo: false#| label: tbl-citation#| tbl-cap: Citation statisticscitation_stats <- citation_data_stats %>%mutate(publication_year =as.integer(publication_year), .before =everything()) %>%rename(`Publication year`="publication_year",`Primary domain name`="primary_domain_name",`First author affiliation`="is_lu_first_author",`25% quantile`="q_25",`Median`="median",`75% quantile`="q_75",`95% quantile`="q_95",`99% quantile`="q_99",`Maximum`="max" )datatable( citation_stats,caption ="Citation statistics",filter =list(position ='top', clear =FALSE),options =list(pageLength =15,scrollX =TRUE,dom ='Bfrtip',buttons =c('excel') ),extensions ='Buttons')```As expected, citation patterns, regardless of discipline, publication year, orauthor affiliation, tend to follow a *power-law* distribution. Most researcharticles receive relatively few citations, as shown by the low median. In thehealth sciences, for LU-affiliated lead authors, only 25% of articles publishedin 2024 received at least 2 citations (as shown in the 75th percentile column inthe table above), and just 1% reached 16 citations. However, citation countsaccumulate over time, so it is more informative to consider older publications.For the same group, among articles published in 2019, 25% received at least 18citations, and 1% reached 147 or more. The highest citation count observed for asingle article in this group was 482.Luxembourg has three official languages but English is the language of science, regardlessof domain:```{r}#| echo: false#| label: tbl-languages#| tbl-cap: Distribution of languages used to write articleslanguages <- languages %>%mutate(publication_year =as.integer(publication_year), .before =everything()) %>%rename(`Publication year`="publication_year",`Primary domain name`="primary_domain_name",`First author affiliation`="is_lu_first_author" )datatable( languages,caption ="Distribution of languages used to write articles",filter =list(position ='top', clear =FALSE),options =list(pageLength =15,scrollX =TRUE,dom ='Bfrtip',buttons =c('excel') ),extensions ='Buttons')```