Technical

Marvik Digest #2

July 5th, 2022

Marvik

Welcome to the latest Marvik Digest 🚀

This month we have some interesting stories involving multi-GAN optimization, Microsoft’s new IoT Insider Lab, speech-to-speech translation models, advancements in transformer architectures, and more.

Want us to cover a specific topic? DM or ping us to [email protected] to send us your suggestions. Stay tuned!

InsetGAN

In the realm of Computer Vision, generation of full-body human images is still a huge challenge🧍‍♀️🧍‍♂️.

As humans, we are all different from each other. In terms of looks, we have our unique identity, appearance, shape and pose.

‍Generative adversarial networks (GANs) emerged as a successful image generation paradigm.

However, issues arise when dealing with classes that show complex variations.

In a recent paper published by Adobe Research, Kaust and University College London, they propose InsetGAN, an innovative method that combines multiple pretrained GANs, where one GAN generates a global canvas and a series of specialized GANs focus on different body parts that can be inserted into the former.

Main takeaways:

Introduces a multi-GAN optimization framework that jointly optimizes the latent codes of two or more collaborative generators such that the final image, formed by inserting the part insets on the canvas, does not exhibit any seams (e.g., a face, when added to the body, will be consistent in skin tone, clothing boundaries, and hair flow).
Different canvas/part GANs can be trained at different resolutions, thus lowering the data (quality) requirements.
Setup demonstrated by combining a full body GAN with a dedicated high-quality face GAN to produce plausible-looking humans.
Tested on a custom dataset and evaluated results with quantitative metrics and user studies.

👉 Find out more here https://bit.ly/3tjNJuP 👉 Visit marvik.ai or reach out to [email protected] to learn more about our experience using GANs.

Weekend Getaway

A few days ago we had the chance to share some incredible moments during our team getaway. We spent the whole weekend in a beautiful house, surrounded by nature and breathtaking landscapes 🍂🌳.

There was room for everything. Playing board games near the fireplace 🔥, spirited ping-pong competitions 🏓 and improvised guitar jams and sing-alongs 🎸.

In addition to this, part of the team volunteered to cook and delighted us with a nice Uruguayan barbecue and mouth-watering arepas 🇻🇪.

Even more rewarding was witnessing the presence of most of the Marvik team, both from Uruguay 🇺🇾 and different parts of Argentina 🇦🇷.

For some of them, it was their first time visiting 🇺🇾, and certainly the first time we met in person. Our team keeps growing and growing, and this is just the beginning.

🚀 Will you risk missing our next getaway?

Make sure that doesn't happen. 👉 Click here to see all our open positions, or drop us an email to [email protected] to find out more.

Microsoft IoT Insider Lab

📢 Some great news for the artificial intelligence community in Latin America 📢

‍Microsoft has chosen Uruguay 🇺🇾 to host its new AI & IoT Insider Lab, the first of its kind in the region and only the third outside the US 🇺🇸💡

This is game-changing given the growing impact of AI & IoT in the way people, devices and data interact in all aspects of life. Moreover, it puts Uruguay on the path of becoming an "innovation hub" for the region, acting as a facilitator of innovation and creativity to transform business realities.

🚀 The lab’s mission is to show startups, corporations and organizations across industries how to leverage AI and IoT technologies to solve related challenges, while providing guidance and recommendations from experts so they can achieve their full potential.

The lab will offer:

Experience-based knowledge from expe rts: electrical engineers, cloud engineers, data scientists, program managers, project managers, and software engineers.
On-demand dedication from highly qualified Microsoft collaborators.
Project management, design, architecture, prototyping, and post-implementation customer and partner guidance.

👉 More on this initiative here https://bit.ly/3NPyNgk

👉 If you’re curious about how Microsoft’s AI & IoT Labs work, click here https://bit.ly/3NSyu46

New speech-to-speech translation model

Meta AI has recently released a new research paper on speech-to-speech translation (S2ST) that does not rely on text generation as an intermediate step

This method enables faster inference and supports translation between unwritten languages (important since +40 %of the world’s languages are without text writing systems). Instead of the traditional approach (translating source speech into target speech spectrograms), they used discretized speech units obtained from the clustering of self-supervised speech representations.

Main achievements:

First of its kind trained on real-world open sourced audio data for multiple language pairs
Outperforms previous direct S2ST systems in terms of #runtime , #FLOPS, and #maxmemory
Leverages pretraining with unlabeled speech data

👉 Click here to learn more https://bit.ly/3HEetvS

DIET Transformer

In our latest blog post, our ml engineer Diego Sellanes discusses DIET, Rasa’s latest transformer architecture, which works for entity recognition and intent classification. He goes over to explain how it works, its different modules, as well as its main advantages compared to similar models.

“RASA’s DIET transformer has a very powerful architecture. It proposes a new way of understanding state-of-the-art transformers, with a clever loss function which sums up every aspect of the model.”

👉 Visit our blog for the full story here

At Marvik, we have used Transformers to execute several NLP projects. DM or reach out to [email protected] if you are curious about how you could apply them to enhance your NLP models.

YOLOv6

YOLOv6 is finally out 🚀

#YOLOv6 is a single-stage object detection framework dedicated to industrial applications, with hardware-friendly efficient design and high performance.

Main takeaways:

Efficient Decoupled Head with SIoU Loss
Hardware-friendly Design for Backbone/Neck
Detection accuracy and inference speed far exceed that of previous #YOLOv5
Released under GNU General Public v3.0
Coming soon: + deployment options and quantization tools

👉 Check out the repo here https://bit.ly/3AaQHpy

Parti Model

Google AI has recently launched the Pathways Autoregressive Text-to-Image model (Parti), its second text-to-image generator model 📢

Parti uses an autoregressive model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge.

Highlights:

Treats text-to-image generation as a sequence-to-sequence modeling problem (akin to machine translation) → allows it to benefit from advances in large language models.
Shows consistent quality improvements by scaling its encoder-decoder up to 20B parameters.
Achieves State-of-the-art zero-shot FID score.
Complementary to Imagen (its predecessor) in exploring two different families of generative models - autoregressive and diffusion → opens up exciting opportunities to combine both. It’s exciting to witness all these breakthroughs in text-to-image generation 🚀

👉 Click here to learn more about Parti https://bit.ly/3I4lMxe

Marvik Digest #2

InsetGAN

Weekend Getaway

Microsoft IoT Insider Lab

New speech-to-speech translation model

DIET Transformer

YOLOv6

Parti Model

News, Insights & Impact

Model Context Protocol: Supercharge your Agents with MCP

Exploring – Nvidia CuOpt

Exploring NVIDIA Isaac GR00T

Genesis: Redefining Robotics and Physics Simulations

Every AI journey starts with a conversation