This Hackathon App Lets Instagram Posts Talk Back Using AI

Written by gokulsrinaths | Published 2025/05/29
Tech Story Tags: ai | social-media | ai-in-social-media | ai-for-social-media | ai-powered-social-media-reply | triple-tap-ai-interaction | gesture-based-ai-interface | talk-to-instagram-posts

TLDRInstallama is an AI overlay that lets users triple-tap any post and open an instant conversational window.via the TL;DR App

👋 Why We Built This

Social platforms make it easy to react — but not to respond.

You double-tap to like.
You comment if you have time.
But what if you could justask the post a question, instantly?

When I was selected for Meta’s invite-only LlamaCon Hackathon, I teamed up with fellow engineer Rashmi to test an idea we couldn’t stop thinking about: what if posts could talk back?

We built Installama, an AI overlay that lets users triple-tap any post — image, caption, meme — and open an instant conversational window powered by Meta’s LLaMA API.

🔄 How Installama Works : 🎥 Watch the full demo

The UX is natural and invisible:

  • Double tap → Like (as usual)
  • Triple tap → AI wakes up and talks to the post

The AI response is:

  • Context-aware (reads caption + tags)
  • Conversational (answers or reacts)
  • Fast and expressive

You no longer need to comment or DM. You just tap — and the post responds like a person.

🧱 The Tech Stack

We collaborated on design and UX, but I led the engineering side:

  • Meta’s LLaMA API – for generating fast, context-aware responses
  • Next.js + Tailwind – for clean frontend and mobile-style tapping logic
  • Supabase – for storing taps, metadata, and AI session states
  • Dynamic prompt chaining – pulling metadata (caption, category, tags) into each prompt to give LLaMA richer context

It works for both logged-in and guest users. No typing required. Just tap.

🏆What Happened at Meta’s LlamaCon

This was Meta’s first official hackathon showcasing its powerful LLaMA API.

Our project, Installama, was:

  • ✅ Accepted into the hackathon after a selective application process
  • Publicly showcased on the official Cerebral Valley LlamaCon page
  • ✅ Reviewed by Meta engineers and Cerebral Valley organizers
  • ✅ Awarded API power-user status based on technical implementation

This recognition validated that our project wasn’t just cool — it was forward-thinking.

💡 Why Installama Matters

Most LLM products treat AI like a chatbot-in-a-box.

Installama flips that by embedding AI into natural user behavior — tapping. It’s:

  • Seamless
  • Frictionless
  • Intent-driven

And most importantly, it opens the door for gesture-based AI interfaces — a powerful new way to make LLMs feel human.

Imagine:

  • Talking to a meme

  • Triple-tapping a headline to ask it questions

  • Commenting without commenting

It’s not chat — it’s interaction.

🎯 What’s Next

We’re evolving Installama into a full-featured AI UX framework. My current roadmap includes:

  • 🎙️ Voice-triggered taps – for accessibility and gesture-free interaction
  • 🧠 Reaction history – letting users see what others asked or how the post responded
  • ✍️ Creator-mode replies – where influencers can pre-train the AI to respond in their voice

🔗 Under the Hood

👇 Key Features

  • Triple-tap gesture triggers AI overlay

  • Meta’s LLaMA API generates human-like responses

  • Prompt chaining incorporates caption, tags, and image analysis

  • Works for guest and logged-in users

  • Streaming response via Server-Sent Events (SSE)

  • Frontend fails over to client-side analysis if backend is unavailable

    🧱 Architecture Overview

                                                [User Triple Taps Post]

                                                         ↓ 

                                                [Frontend: Tap Handler]

                                                         ↓ 

                                            [Supabase Logs Tap + Session]

                                                         ↓ 

                                  [Backend: Gemini Vision → Prompt Chain → LLaMA API\]

                                                         ↓ 

                                       [Frontend: Render AI Response Overlay]

⚙️Tap Detection Logic

Detects triple taps with 300ms timeout. Uses a React Native hook.

// useTripleTap.js
export const useTripleTap = (onTripleTapCallback) => {
  const [tapCount, setTapCount] = useState(0);
  const lastTapTimeRef = useRef(0);
  const MAX_DELAY = 300;

  const onHandlerStateChange = useCallback((event) => {
    if (event.nativeEvent.state === State.ACTIVE) {
      const now = Date.now();
      if (now - lastTapTimeRef.current > MAX_DELAY) setTapCount(1);
      else setTapCount((prev) => prev + 1);
      lastTapTimeRef.current = now;

      if (tapCount === 2) {
        setTapCount(0);
        Haptics.impactAsync(Haptics.ImpactFeedbackStyle.Medium);
        onTripleTapCallback();
      }
    }
  }, [tapCount, onTripleTapCallback]);

  useEffect(() => {
    if (tapCount > 0) {
      const timer = setTimeout(() => setTapCount(0), MAX_DELAY);
      return () => clearTimeout(timer);
    }
  }, [tapCount]);

  return onHandlerStateChange;
};

🔐Supabase Session Logging

Stores tap data and session metadata.

await supabase.from('tap_events').insert([
  {
    post_id: postId,
    tap_type: 'triple',
    session_id: sessionId,
    user_id: user?.id,
    timestamp: new Date().toISOString()
  }
]);

🧠Prompt Chaining with LLaMA

System → Vision → Prompt → LLaMA. Prompt includes caption, tags, image analysis.

const messages = [
  {
    role: 'system',
    content: `You are Installama, an AI that replies like a social media post. You have access to Gemini image analysis:\n${imageAnalysis}\nCaption: ${caption}\nTags: ${hashtags}`
  },
  {
    role: 'user',
    content: question || "What's interesting about this post?"
  }
];

const response = await axios.post(LLAMA_API_URL, {
  model: "Llama-4-Maverick-17B-128E-Instruct-FP8",
  messages: messages
}, {
  headers: {
    'Authorization': `Bearer ${LLAMA_API_KEY}`,
    'Content-Type': 'application/json'
  }
});

🧬Image Analysis (Gemini Vision API)

Image converted to base64. Gemini returns structured text describing subject, mood, tone.

const model = genAI.getGenerativeModel({ model: "gemini-pro-vision" });

const result = await model.generateContent({
  contents: [{
    role: "user",
    parts: [
      { text: "Describe this image in detail..." },
      {
        inlineData: {
          data: base64Image,
          mimeType
        }
      }
    ]
  }]
});

const imageAnalysis = result.response.text().trim();

📡Streaming Responses (SSE)

AI responses streamed word-by-word.

// Express endpoint
app.post('/api/analyze-image-stream', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  const response = await streamLlamaResponse(...);

  const words = response.split(' ');
  for (let i = 0; i < words.length; i += 3) {
    const chunk = words.slice(i, i + 3).join(' ');
    res.write(`data: ${JSON.stringify({ type: 'chunk', text: chunk })}\n\n`);
    await new Promise(r => setTimeout(r, 100));
  }
});

🧾Frontend Fallbacks

If backend fails, Gemini runs client-side.

try {
  const response = await axios.post('/api/analyze-image', { imageUrl, caption });
  return response.data;
} catch {
  const imageAnalysis = await GeminiVisionService.analyzeImageFromUrl(imageUrl);
  return {
    success: true,
    response: `Image analysis: ${imageAnalysis}`,
    source: 'client-side'
  };
}

🔒 Safety & Moderation

  • Guest sessions are anonymous
  • Moderation via OpenAI’s API (planned)
  • Rate limits to prevent abuse

Installama isn’t just a chatbot — it’s a gesture-first AI interaction layer.
No input box. No commands. Just tap.

🧾 Tags

#llama #meta #llm #socialai #ux #frontend #tripleTap #promptengineering #humanai #llamacon


Written by gokulsrinaths | AI Lead
Published by HackerNoon on 2025/05/29