In July 2015, the MoMA released the metadata associated with their collection on Github. People used it to classify artworks by artist gender, by acquisition dates and even created a Twitter bot that creates fake art descriptions.
To see how far users could go with the data, the MoMA organised a datathon with the help of NYU. For two days in February, multidisciplinary teams made of technical people and people with an art history background explored data from the MoMA, the Cooper Hewitt, the Tate, the Carnegie Museum of Art, the SFMOMA and the Frick Collection.
The datathon started with presentations of what can be done with data by Adriana Crespo-Tenori, Lead Researcher at Facebook, and Lev Manovich, who worked with MoMA’s dataset and gained fame thanks to his project Selfie City. After this introduction that allowed participants to have a baseline knowledge of the subject, they had just under 24 hours to find a interesting way to use the museums’ data an then present it to the judges.
The two winning projects were a solution examining curatorial approaches through language analysis of exhibition titles and a study of the artists who were the most exhibited at MoMA over time. Thanks to the latter, the MoMA confirmed its love for Pablo Picasso, who came first, and by far!
Other projects were also about the relationship between colours in paintings and the geographical background of the artists, comparisons between the gender of artists in MoMA and SFMoMA’s collections or semantic analysis of works in the MoMA’s collection.
The datathon was a way for the MoMA to approach its collection in a new way, see what people are excited about and upload new datasets on GitHub based on their experience.
The MoMA datathon reminds us of the API{dot}ART event that took place last May in Paris with Images d’Art, with data and maker devices, like 3D printers and Raspberry Pis – a year after their datathon, they will tell you everything about their findings during a workshop at We Are Museums. Don’t miss it!


We curate