Site Reliability Engineer – Application Data Services


Job title: Site Reliability Engineer – Application Data Services 
Location: Bucharest, Hybrid
Type: employment contract


Kubernetes, Prometheus/ Graphite/Kibana
Java / Python/Go, Node.js , Perl / Ruby


Contributing to a high scale, complex, world renowned product and seeing real-time impact of your work on millions of travelers worldwide
Working in a fast-paced and performance driven culture
Technical, behavioral and interpersonal competence advancement via on-the-job opportunities, experimental projects, hackathons, conferences and active community participation
Competitive compensation and benefits package
Vast amounts of data to validate your ideas and the opportunity to experiment with real users.


Our client is a new established Center of Excellence based in Bucharest, Romania and was created to support the increasing business.
The Center of Excellence provides access to specialised and highly skilled talent, leading industry best practices, and collaboration opportunities across all our client brands. As part of our client’s Romania team, you will have the opportunity to be a part of the world’s leading provider of online travel, with a mission of making it easier for everyone to experience the world through six-primary consumer facing brands.


Do you want to build software that impacts millions of customers around the world, tackling some of the world’s most complex ecommerce infrastructure challenges? We are looking for talented Site Reliability Engineers to join our client Application Data Services (ADS) department in Bucharest. In ADS, the team design, build and operate all the technology that our client product development teams need to deliver great travel products to our customers, including databases, data streaming platform, and key application services for images, messaging and email infrastructure.
As an SRE in this role, you will own and drive the future of in-memory databases, along with the queues service our client. You will support an existing large-scale reliable infrastructure on-prem and will build the next generation infrastructure on-prem and on the cloud.
Our In-Memory Stores team builds and manages the Queues service, and supports large scale Redis and Memcached infrastructure, including hundreds of Queues and other use cases spanning several hundred servers. We provide client libraries with built-in features like monitoring and authentication, as well as complete self-service of queues / in-memory
DBs management to our customers. We are also committed to help modernize the existing platforms, and develop our cloud strategy and integrations.
We use Puppet, Java, Perl and NodeJS. We are pragmatic and passionate about the quality of our product: we strongly believe in unit-testing, design documents, and code reviews. As a member of this team, you will own, operate and evolve critical, high scale services.


Design, develop and implement systems software that improves the stability, scalability, availability and latency of our client products
Take ownership of one or more services and have the freedom to do what is best for our business and customers
Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again
Build effective monitoring to monitor the health of your system, and jump in to handle outages
Build and run capacity tests to manage the growth of your systems
Plan for reliability by designing systems to work across our multinational data centers
Develop tools to assist the product development teams with successfully deploying 1000s of change sets every day
Be an advocate of engineering best practices
Share the on-call rotation and be an escalation contact for incidents
Contribute to our client growth through interviewing, on-boarding, or other recruitment efforts


Solid experience in at least one programming language. We use Java, Python, Go,NodeJS, Ruby, Perl
Experience with building, operating and maintaining scalable distributed systems and with operations automation
Experience with Infrastructure as Code technologies
Knowledge of cloud computing fundamentals
Solid foundation in Linux administration and troubleshooting
Understanding of Service level agreements and objectives
Additional experience in OpenStack, Kubernetes, Networking, Security or Storage is desirable
Monitoring / observability technologies like Prometheus, Graphite, Grafana, Kibana, Elasticsearch are a plus
Good interpersonal skills
Proficient command of the English language, both written and spoken

 Apply today

If you meet the minimum requirements and are interested in applying for this position, please send your details to careers@key-talents.com with “Site Reliability Engineer – Application Data Services”, in the subject line.