Morris Alper
@malper.bsky.social
PhD student researching multimodal learning (language, vision, ...).
Also a linguistics enthusiast.
morrisalp.github.io
Also a linguistics enthusiast.
morrisalp.github.io
ConlangCrafter could potentially be used in pedagogy, typological and NLP work, and many entertainment applications. Imagine a video game where aliens can speak countless new procedurally-generated languages.
October 11, 2025 at 5:35 AM
ConlangCrafter could potentially be used in pedagogy, typological and NLP work, and many entertainment applications. Imagine a video game where aliens can speak countless new procedurally-generated languages.
To enhance consistency and diversity, our pipeline incorporates randomness injection and self-refinement mechanisms. This is measured by our novel evaluation framework, providing rigorous evaluation for the new task of computational conlanging.
October 11, 2025 at 5:35 AM
To enhance consistency and diversity, our pipeline incorporates randomness injection and self-refinement mechanisms. This is measured by our novel evaluation framework, providing rigorous evaluation for the new task of computational conlanging.
The ConlangCrafter pipeline harnesses an LLM to generate a description of a constructed language and self refines it in the process. We decompose language creation into phonology, grammar, and lexicon, and then translate sentences while constructing new needed grammar points.
October 11, 2025 at 5:35 AM
The ConlangCrafter pipeline harnesses an LLM to generate a description of a constructed language and self refines it in the process. We decompose language creation into phonology, grammar, and lexicon, and then translate sentences while constructing new needed grammar points.
Conlangs (Constructed Languages), from Tolkien’s Elvish to Esperanto, have long been created for artistic, philosophical, or practical purposes.
As generative AI proves its creative power, we ask:
Can it also take on the laborious art of conlang creation?
As generative AI proves its creative power, we ask:
Can it also take on the laborious art of conlang creation?
October 11, 2025 at 5:35 AM
Conlangs (Constructed Languages), from Tolkien’s Elvish to Esperanto, have long been created for artistic, philosophical, or practical purposes.
As generative AI proves its creative power, we ask:
Can it also take on the laborious art of conlang creation?
As generative AI proves its creative power, we ask:
Can it also take on the laborious art of conlang creation?
The number of languages in the world just got a lot higher! At least constructed ones.
Meet ConlangCrafter - a pipeline for creating novel languages with LLMs.
A Japanese-Esperanto creole? An alien cephalopod color-based language?
Enter your idea and see a conlang emerge. 🧵👇
Meet ConlangCrafter - a pipeline for creating novel languages with LLMs.
A Japanese-Esperanto creole? An alien cephalopod color-based language?
Enter your idea and see a conlang emerge. 🧵👇
October 11, 2025 at 5:35 AM
The number of languages in the world just got a lot higher! At least constructed ones.
Meet ConlangCrafter - a pipeline for creating novel languages with LLMs.
A Japanese-Esperanto creole? An alien cephalopod color-based language?
Enter your idea and see a conlang emerge. 🧵👇
Meet ConlangCrafter - a pipeline for creating novel languages with LLMs.
A Japanese-Esperanto creole? An alien cephalopod color-based language?
Enter your idea and see a conlang emerge. 🧵👇
At inference time, we inject the appearance of the observed view to get consistent novel views. This also enables cool applications like appearance-conditioned NVS! (4/5)
June 17, 2025 at 4:16 PM
At inference time, we inject the appearance of the observed view to get consistent novel views. This also enables cool applications like appearance-conditioned NVS! (4/5)
To learn from this data, we use a novel multi-view diffusion architecture adapted from CAT3D, modeling appearance variations with a bottleneck encoder applied to VAE latents and disambiguating scene scale via warping. (3/5)
June 17, 2025 at 4:16 PM
To learn from this data, we use a novel multi-view diffusion architecture adapted from CAT3D, modeling appearance variations with a bottleneck encoder applied to VAE latents and disambiguating scene scale via warping. (3/5)
Photos like the ones below differ in global appearance (day vs. night, lighting), aspect ratio, and even weather. But they give clues to how scenes are build in 3D. (2/5)
June 17, 2025 at 4:16 PM
Photos like the ones below differ in global appearance (day vs. night, lighting), aspect ratio, and even weather. But they give clues to how scenes are build in 3D. (2/5)
💥New preprint! WildCAT3D uses tourist photos in-the-wild as supervision to learn to generate novel, consistent views of scenes like the one shown below. h/t Tom Monnier and all collaborators (1/5)
June 17, 2025 at 4:16 PM
💥New preprint! WildCAT3D uses tourist photos in-the-wild as supervision to learn to generate novel, consistent views of scenes like the one shown below. h/t Tom Monnier and all collaborators (1/5)
Finally we show that ProtoSnap-aligned skeletons can be used as conditions for a ControlNet model to generate synthetic OCR training data. By controlling the shapes of signs in training, we can achieve SOTA on cuneiform sign recognition. (Bottom: synthetic generated sign images)
February 4, 2025 at 6:24 PM
Finally we show that ProtoSnap-aligned skeletons can be used as conditions for a ControlNet model to generate synthetic OCR training data. By controlling the shapes of signs in training, we can achieve SOTA on cuneiform sign recognition. (Bottom: synthetic generated sign images)
Our results show that ProtoSnap effectively aligns wedge-based skeletons to scans of real cuneiform signs, with global and local refinement steps. We provide a new expert-annotated test set to quantify these results.
February 4, 2025 at 6:24 PM
Our results show that ProtoSnap effectively aligns wedge-based skeletons to scans of real cuneiform signs, with global and local refinement steps. We provide a new expert-annotated test set to quantify these results.
ProtoSnap uses features from a fine-tuned diffusion model to optimize for the correct alignment between a skeleton matched with a prototype font image and a scanned sign. Perhaps surprising that image generation models can be applied to this sort of discriminative task!
February 4, 2025 at 6:24 PM
ProtoSnap uses features from a fine-tuned diffusion model to optimize for the correct alignment between a skeleton matched with a prototype font image and a scanned sign. Perhaps surprising that image generation models can be applied to this sort of discriminative task!
We tackle this by directly measuring the internal configuration of characters. Our approach ProtoSnap "snaps" a prototype (font)-based skeleton onto a scanned cuneiform sign using a multi-stage pipeline with SOTA methods from computer vision and generative AI.
February 4, 2025 at 6:24 PM
We tackle this by directly measuring the internal configuration of characters. Our approach ProtoSnap "snaps" a prototype (font)-based skeleton onto a scanned cuneiform sign using a multi-stage pipeline with SOTA methods from computer vision and generative AI.
Some prior work has tried to classify scans of signs categorically, but signs' shapes differ drastically in different time periods and regions making this less effective. E.g. both signs below are AN, from different eras. (Top: font prototype; bottom: scan of sign real tablet)
February 4, 2025 at 6:24 PM
Some prior work has tried to classify scans of signs categorically, but signs' shapes differ drastically in different time periods and regions making this less effective. E.g. both signs below are AN, from different eras. (Top: font prototype; bottom: scan of sign real tablet)
Cuneiform at #ICLR2025! ProtoSnap finds the configuration of wedges in scanned cuneiform signs for downstream applications like OCR. A new tool for understanding the ancient world!
tau-vailab.github.io/ProtoSnap/
h/t Rachel Mikulinsky @ShGordin @ElorHadar and all collaborators.
🧵👇
tau-vailab.github.io/ProtoSnap/
h/t Rachel Mikulinsky @ShGordin @ElorHadar and all collaborators.
🧵👇
February 4, 2025 at 6:24 PM
Cuneiform at #ICLR2025! ProtoSnap finds the configuration of wedges in scanned cuneiform signs for downstream applications like OCR. A new tool for understanding the ancient world!
tau-vailab.github.io/ProtoSnap/
h/t Rachel Mikulinsky @ShGordin @ElorHadar and all collaborators.
🧵👇
tau-vailab.github.io/ProtoSnap/
h/t Rachel Mikulinsky @ShGordin @ElorHadar and all collaborators.
🧵👇
Our results show that ProtoSnap effectively aligns wedge-based skeletons to scans of real cuneiform signs, with global and local refinement steps. We provide a new expert-annotated test set to quantify these results.
February 4, 2025 at 6:13 PM
Our results show that ProtoSnap effectively aligns wedge-based skeletons to scans of real cuneiform signs, with global and local refinement steps. We provide a new expert-annotated test set to quantify these results.
ProtoSnap uses features from a fine-tuned diffusion model to optimize for the correct alignment between a skeleton matched with a prototype font image and a scanned sign. Perhaps surprising that image generation models can be applied to this sort of discriminative task!
February 4, 2025 at 6:13 PM
ProtoSnap uses features from a fine-tuned diffusion model to optimize for the correct alignment between a skeleton matched with a prototype font image and a scanned sign. Perhaps surprising that image generation models can be applied to this sort of discriminative task!
We tackle this by directly measuring the internal configuration of characters. Our approach ProtoSnap "snaps" a prototype (font)-based skeleton onto a scanned cuneiform sign using a multi-stage pipeline with SOTA methods from computer vision and generative AI.
February 4, 2025 at 6:13 PM
We tackle this by directly measuring the internal configuration of characters. Our approach ProtoSnap "snaps" a prototype (font)-based skeleton onto a scanned cuneiform sign using a multi-stage pipeline with SOTA methods from computer vision and generative AI.
Some prior work has tried to classify scans of signs categorically, but signs' shapes differ drastically in different time periods and regions making this less effective. E.g. both signs below are AN, from different eras. (Top: font prototype; bottom: scan of sign real tablet)
February 4, 2025 at 6:13 PM
Some prior work has tried to classify scans of signs categorically, but signs' shapes differ drastically in different time periods and regions making this less effective. E.g. both signs below are AN, from different eras. (Top: font prototype; bottom: scan of sign real tablet)
We show that our dataset serves as a new, challenging benchmark for common floorplan understanding tasks such as semantic segmentation. We also show it can be used to enable new tasks such as floorplan generation conditioned on building type and boundary.
December 10, 2024 at 4:20 PM
We show that our dataset serves as a new, challenging benchmark for common floorplan understanding tasks such as semantic segmentation. We also show it can be used to enable new tasks such as floorplan generation conditioned on building type and boundary.
We use modern foundation models (LLMs, vision-language models) to filter and structure raw, noisy open data to identify floorplan images and extract structured metadata, including global properties (e.g. floorplan type) and grounded architectural features within images.
December 10, 2024 at 4:20 PM
We use modern foundation models (LLMs, vision-language models) to filter and structure raw, noisy open data to identify floorplan images and extract structured metadata, including global properties (e.g. floorplan type) and grounded architectural features within images.
WAFFLE (WikipediA-Fueled FLoorplan Ensemble) is a multimodal dataset of ~20K diverse floorplans, of many building types (e.g. homes, churches, hospitals, schools, ...), regions, eras, and data formats, along with structured metadata.
December 10, 2024 at 4:20 PM
WAFFLE (WikipediA-Fueled FLoorplan Ensemble) is a multimodal dataset of ~20K diverse floorplans, of many building types (e.g. homes, churches, hospitals, schools, ...), regions, eras, and data formats, along with structured metadata.
Bite into WAFFLE 🧇, our new multimodal floorplan dataset and paper - now accepted to #WACV2025!
Work with Keren Ganon, Rachel Mikulinsky, Hadar Elor.
More info below👇
Work with Keren Ganon, Rachel Mikulinsky, Hadar Elor.
More info below👇
December 10, 2024 at 4:20 PM
Bite into WAFFLE 🧇, our new multimodal floorplan dataset and paper - now accepted to #WACV2025!
Work with Keren Ganon, Rachel Mikulinsky, Hadar Elor.
More info below👇
Work with Keren Ganon, Rachel Mikulinsky, Hadar Elor.
More info below👇
We show that our dataset serves as a new, challenging benchmark for common floorplan understanding tasks such as semantic segmentation. We also show it can be used to enable new tasks such as floorplan generation conditioned on building type and boundary.
December 10, 2024 at 4:17 PM
We show that our dataset serves as a new, challenging benchmark for common floorplan understanding tasks such as semantic segmentation. We also show it can be used to enable new tasks such as floorplan generation conditioned on building type and boundary.
We use modern foundation models (LLMs, vision-language models) to filter and structure raw, noisy open data to identify floorplan images and extract structured metadata, including global properties (e.g. floorplan type) and grounded architectural features within images.
December 10, 2024 at 4:17 PM
We use modern foundation models (LLMs, vision-language models) to filter and structure raw, noisy open data to identify floorplan images and extract structured metadata, including global properties (e.g. floorplan type) and grounded architectural features within images.