SILICON @Stanford
banner
stanfordsilicon.bsky.social
SILICON @Stanford
@stanfordsilicon.bsky.social
Stanford Initiative on Language Inclusion and Conservation in Old and New Media | Advancing Digitally Disadvantaged Languages @Stanford
The phenomenon of "I can't not do this"... until "I can't do this" underpins so much of this work. #FaceInterface
January 19, 2025 at 2:15 AM
Audience comment: Look at the foundations you're building it on. With software, realized it wasn't a stable foundation and had to rewrite a lot of code. A tool nearly collapsed because of MacOS updates, took multiple people 18 months to fix, nearly crippled a whole corner of the ecosystem.
January 19, 2025 at 2:14 AM
Audience comment: so many projects are passions and hobbies, 80%+ donated time. Started thinking about business models and revenue streams from day 1, instead of making it and open-sourcing it and realizing I couldn't maintain it any longer. #FaceInterface
January 19, 2025 at 2:13 AM
Marc Weber: the question of freezing language in amber -- the ship sailed with writing. If you don't update to the current changes in writing, you're doing harm. It's a much more weighty decision to orphan the languages. Necessary but not sufficient to encode languages. #FaceInterface
January 19, 2025 at 2:06 AM
Hrant Papazian: "Preservation is for museums. Young people don't like to go to museums because they don't have a past. Young people want to make a future, and the future is the only way out. We have to build the future and get tools for the future. Young women matter more for Nüshu." #FaceInterface
January 19, 2025 at 2:02 AM
@tsmullaney.bsky.social "Open source" is the great-great-grandchild of this romantic idea, there are communities that don't buy it. How does one confront this conflict between "shared heritage" and linguistic/cultural ownership. "Uni-" has a history, are we structurally repeating? #FaceInterface
January 19, 2025 at 2:00 AM
@tsmullaney.bsky.social Notion of shared human heritage is a free market concept, an open sound-stage of existence. Romantic vision of a shared destiny, beauty, etc. But when that vision was literally guiding principle of marching gunboats to remove an obstacle... 👀 #FaceInterface
January 19, 2025 at 1:58 AM
Kamal Mansour: "What we're speaking about here is enabling digital writing, which is not the same as language. By representing the writing of particular languages in Unicode, enable them to create digital patrimony if they choose. It doesn't mean that's all they create." #FaceInterface
January 19, 2025 at 1:54 AM
Audience comment: "I'm constantly running into how Unicode screws up my world. It doesn't take into consideration the full things necessary for the expression of the language. Nothing about the rules of typography and representation in layout. When fonts disagree you have chaos." #FaceInterface
January 19, 2025 at 1:53 AM
Anushah Hossain: Holding up Unicode inclusion or digitization as the ultimate goal... is that too simplistic? What's the goal we're trying to achieve?
January 19, 2025 at 1:52 AM
Peter Constable: "We should be enablers of local choice. And yes, that means that some languages will die. Make sure local communities know they have choices, choices are there, and we're there to support when they choose a path where we can help." #FaceInterface
January 19, 2025 at 1:50 AM
@tsmullaney.bsky.social "Do you want to see the world saved, or do you want to be the one who saved the world?" In a structural way, this leads to the fragmentation and silo-ization. Ego plays a part. What organization says 'let's merge, and I'll take your name'?" #FaceInterface
January 19, 2025 at 1:45 AM
Sina Ahmadi closes out the #FaceInterface slides the best way: "these are some dolma my wife and I made."
January 19, 2025 at 1:11 AM
Sina Ahmadi: Goal is to create a machine translation system. Limited amount of data, so fine-tuning existing models. Meta's No Language Left Behind model covers 200 languages. Super-low BLEU score for these languages with NLLB, fine-tuning had big improvements. #FaceInterface
January 19, 2025 at 1:08 AM
Sina Ahmadi: Hawrami had almost 10 people contributing. 46 hours of speech data collected using DOLMA speech bot. Used same multilingual corpus, asked people to select language and read sentences. 28k utterances! #FaceInterface
January 19, 2025 at 1:06 AM
Sina Ahmadi: Gave volunteers a set of sentences in a highly resourced language they know and in English. Community-driven multilingual parallel corpus, > 50,000 sentences total. Previously some of the languages only had 100 sentences online. All sentences aligned with English. #FaceInterface
January 19, 2025 at 1:05 AM
Sina Ahmadi: Some skepticism, "adding fuel to cultural hegemony of Turkish language". (Someone on Reddit was mad because of dolma reference.) Dolma is a food, but thought it'd be a good name because it'd be outside of politics. It's an acronym! Nothing to do with Turkish! #FaceInterface
January 19, 2025 at 1:02 AM
Sina Ahmadi: Vision was community building, data collection, NLP development, scientific dissemination, sustainability & impact -- in that order. Intensive outreach campaign last fall, publishers, language experts, academics, native speakers. 30 highly active volunteers. #FaceInterface
January 19, 2025 at 1:01 AM