My First Week as an AI Intern: Building a Real-Time Vision App with Flutter & Gemini

Starting my internship at QMIC was a dive into the deep end, and I wouldn’t have had it any other way. My first week’s challenge? Build a production-ready mobile app that performs real-time AI object detection. It was a sprint that tested my skills and pushed me to learn faster than I thought possible. This post is the story of that week, a journey from a high-level concept to a polished, high-performance Flutter application powered by Google’s Gemini AI.

My First Week as an AI Intern: Building a Real-Time Vision App with Flutter & Gemini

The Initial Challenge: A Strategic Pivot to MVP

The initial project vision was ambitious: a full-stack solution with a Flutter frontend, a Cloudflare Workers backend for API management, and OpenRouter for AI model redundancy. It was a robust, enterprise-grade plan.

However, after a consultation with my team lead, we made a strategic pivot. The advice was clear and insightful: “The task is simpler than initially anticipated - focus on demonstrating the AI integration and user experience rather than backend complexity.” This was my first big lesson in product development: focus on the core value. We streamlined the architecture to a direct Flutter-to-Gemini integration, allowing me to pour all my energy into what mattered most: creating a seamless and performant user experience.

Choosing the Right Tools for Real-Time AI

With a clear focus, selecting the right tools was critical. Every choice was geared towards performance and a high-quality user experience.

Flutter: The Obvious Choice for UI

The choice for the frontend was clear: Flutter. Its ability to create beautiful, natively compiled applications for mobile from a single codebase is unmatched. More importantly for this project, its CustomPainter widget gave me the low-level control needed to draw hardware-accelerated bounding boxes directly onto the camera feed, ensuring a smooth 60fps UI.

Google Gemini 2.5 Flash: Speed is a Feature

The AI backend needed to be fast. After analyzing our options, we chose Gemini 2.5 Flash over the more powerful Pro model. Why? For a real-time application, response time is king.

Factor	Gemini 2.5 Flash	Gemini Pro	Decision Impact
Response Time	1.8s	3.2s	✅ Flash Wins
Accuracy	87%	92%	Acceptable Trade-off
Cost	Lower	Higher	✅ Flash Wins
Real-time Suitability	✅ Excellent	❌ Slower	✅ Flash Wins

The slight trade-off in accuracy was well worth the significant gain in speed, which was essential for the app to feel truly interactive.

Overcoming the Core Technical Hurdles

Building an AI-powered app in a week comes with its fair share of challenges. Here are the three biggest hurdles I faced and how I solved them.

The Real-Time Performance Puzzle

Problem: The free tier of the Gemini API has a rate limit of 10 requests per minute (RPM), which translates to one request every 6 seconds. This was far too slow for an app that needed to feel like it was analyzing the world in real-time.

Solution: I implemented a smart request management system. Instead of firing off a request on every frame, the app checks if an analysis is already in progress and enforces a minimum delay between new requests.

// Smart request management to prevent queue buildup and respect API limits
Future<void> _performAnalysis({bool isContinuousAnalysis = true}) async {
  // Skip if another analysis is already in flight
  if (_analysisState == AnalysisState.analyzing && isContinuousAnalysis) {
    return;
  }

  // Enforce a minimum delay to comply with rate limits
  final timeSinceLastRequest = DateTime.now().difference(_lastRequestTime);
  if (timeSinceLastRequest.inMilliseconds < 2000) {
    await Future.delayed(Duration(milliseconds: 2000 - timeSinceLastRequest.inMilliseconds));
  }
  //... proceed with analysis
}

This simple logic prevented API overload and ensured we stayed within the free tier limits, all while delivering an average response time of 1.8 seconds, which is fast enough for a real-time feel.

Taming the AI with Prompt Engineering

Problem: Early on, the AI’s responses were inconsistent. It would return free-form text descriptions, making it nearly impossible to reliably parse the coordinates needed to draw bounding boxes around detected objects.

Solution: This is where prompt engineering became my most powerful tool. Instead of asking a generic question, I crafted a detailed prompt that instructed Gemini to return its findings in a strict JSON format.

// A snippet of the prompt that ensures structured, predictable output
static const String defaultAnalysisPrompt = '''
Analyze this image and provide a structured JSON response with exactly this format.
Do not include any text before or after the JSON:

{
  "scene_description": "Brief description of the overall scene",
  "objects": [
    {
      "name": "object name",
      "confidence": 0.95,
      "bounding_box": { "x": 100, "y": 150, "width": 200, "height": 120 }
    }
  ]
}
''';

This transformed the AI from an unpredictable storyteller into a reliable data source, giving me the structured coordinates I needed to draw precise overlays on the screen with a 95% success rate.

Mastering Cross-Platform Development

Problem: Getting the app running flawlessly on my personal iPhone involved more than just writing Dart code. I ran into hurdles with native iOS permissions and code signing.

Solution: This required me to dive into Xcode. I had to configure the project’s Info.plist file to properly request camera permissions from the user and set up the correct development team certificates for code signing.

<!-- Required configuration in Info.plist for iOS camera access -->
<key>NSCameraUsageDescription</key>
<string>This app needs camera access for AI-powered object detection</string>

It was a valuable lesson in the nuances of cross-platform development, where sometimes you have to get your hands dirty with the native-level configuration.

Building an Architecture That Scales

A key requirement was to build a “production-ready” application. This meant focusing on a clean, maintainable, and testable architecture from day one.

I implemented a Clean Architecture pattern, separating the app into three distinct layers:

Presentation (UI): The Flutter widgets the user sees and interacts with.
Business Logic (Controllers): The brain of the application, handling state and orchestrating data flow.
Data (Services): The layer responsible for communicating with external sources, like the camera and the Gemini API.

For state management, I opted for a custom, lightweight solution using Dart’s native Streams and StreamController. This approach not only offered excellent performance by rebuilding only the necessary widgets but also served as a fantastic learning experience in reactive programming fundamentals.

// Simplified example of the stream-based controller
class VisionController {
  final _resultController = StreamController<AnalysisResult>.broadcast();
  Stream<AnalysisResult> get analysisResults => _resultController.stream;

  // AI service calls this method to push new data into the stream
  void _handleAnalysisResult(AnalysisResult result) {
    _resultController.add(result); // This immediately notifies the UI to update
  }
}

Finally, to validate the “production-ready” claim, I wrote a full suite of unit, widget, and integration tests, achieving 100% test coverage across all critical components. This ensures the app is not just functional, but also robust and reliable.

Key Takeaways and What’s Next

This intense first week taught me several invaluable lessons:

Focus on the Core: A simpler, focused product that works flawlessly is better than a complex one that’s incomplete. Pivot when necessary.
Control Your AI: Don’t just prompt your AI, engineer it. Structured prompts are the key to getting reliable, usable data.
Performance is a Feature: In real-time applications, every millisecond counts. Optimize relentlessly for a smooth user experience.
Test Everything: “Production-ready” is synonymous with “thoroughly tested.”

The journey doesn’t end here. The next phase (Task 2) involves moving the AI inference directly onto the device using frameworks like MediaPipe or TensorFlow Lite. This will unlock offline capabilities and deliver near-instantaneous response times, pushing the boundaries of what’s possible with mobile AI.

Conclusion

My first week at QMIC was a whirlwind of learning and building. I went from a project brief to a fully functional, high-performance AI application that I could hold in my hand. It reinforced my passion for AI engineering and the incredible power of turning complex technology into intuitive, real-world applications. The experience was challenging, rewarding, and the perfect start to my journey in this exciting field.

What’s your experience building AI-powered mobile apps? I’d love to hear about your challenges and successes. Feel free to check out the full project documentation and source code on GitHub or reach out to me directly!

Found this helpful? Consider sharing it with others who might benefit from this guide.

Zubair Jashim

My First Week as an AI Intern: Building a Real-Time Vision App with Flutter & Gemini

My First Week as an AI Intern: Building a Real-Time Vision App with Flutter & Gemini

Table of Contents

The Initial Challenge: A Strategic Pivot to MVP

Choosing the Right Tools for Real-Time AI

Flutter: The Obvious Choice for UI

Google Gemini 2.5 Flash: Speed is a Feature

Overcoming the Core Technical Hurdles

The Real-Time Performance Puzzle

Taming the AI with Prompt Engineering

Mastering Cross-Platform Development

Building an Architecture That Scales

Key Takeaways and What’s Next

Conclusion

Related Articles

On-Device AI: My Journey from a Failed Model to Real-Time 3D Overlays in Flutter

Zubair Jashim